๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹/Paper Classification10

NLP tutorial (wikidocs) Pandas 1) Series - 1์ฐจ์› ๋ฐฐ์—ด์˜ ๊ฐ’์— ๊ฐ ๊ฐ’์— ๋Œ€์‘๋˜๋Š” ์ธ๋ฑ์Šค๋ฅผ ๋ถ€์—ฌํ•  ์ˆ˜ ์žˆ์Œ - value์™€ index๋กœ ๊ตฌ์„ฑ 2) DataFrame - 2์ฐจ์› ๋ฐฐ์—ด์˜ ๊ฐ’๊ณผ ํ–‰ ๋ฐฉํ–ฅ ์ธ๋ฑ์Šค, ์—ด ๋ฐฉํ–ฅ ์ธ๋ฑ์Šค๋กœ ๊ตฌ์„ฑ - value, index, column์œผ๋กœ ๊ตฌ์„ฑ - list, dict, ndarrays, Series, ๋˜ ๋‹ค๋ฅธ DataFrame์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ - csv, text, excel, sql, html, json ๋“ฑ ์™ธ๋ถ€ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์ฝ์–ด ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ Numpy 1) ndarray ์ƒ์„ฑ - np.array()๋กœ list, tuple๋กœ๋ถ€ํ„ฐ ndarray๋ฅผ ์ƒ์„ฑ - np.zeros(shape), np.ones(shape), np.full(shape, num), np.eye(shape.. 2020. 3. 16.
Research paper classification systems based on TF-IDF and LDA schemes Crawling of abstract data & Preprocessing - Data crawler๊ฐ€ abstract์™€ keyword๋ฅผ ์ˆ˜์ง‘ํ•˜๊ณ  ์ „์ฒ˜๋ฆฌํ•จ - ์ „์ฒ˜๋ฆฌ๋กœ๋Š” stop words๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  only nouns๋ฅผ ์ถ”์ถœํ•จ - ์ „์ฒ˜๋ฆฌ๋กœ ๋ฐ์ดํ„ฐ ์–‘์ด ์ค„์–ด๋“œ๋ฏ€๋กœ classification system์˜ ํšจ์œจ์„ฑ์„ ๋†’์ž„ - Abstract๊ฐ€ ๋น…๋ฐ์ดํ„ฐ์ด๋ฏ€๋กœ, HDFS๋กœ ๊ด€๋ฆฌํ•จ Managing paper data - ๋ชจ๋“  abstract์˜ keyword ์ค‘์—์„œ ๋น„์Šทํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ–๋Š” keywords๋ฅผ ํ•˜๋‚˜์˜ representative keyword๋กœ Categorizeํ•จ - ๊ทธ ๊ฒฐ๊ณผ 1394๊ฐœ์˜ representative keywords๋ฅผ ์ถ”์ถœํ–ˆ๊ณ , ์ด๊ฒƒ์œผ๋กœ keyword dictionary๋ฅผ ๋งŒ๋“ฆ - ํ•˜์ง€๋งŒ ๊ทธ.. 2020. 3. 3.
Text classification (wikidocs) Text classification - Binary classification vs. Multi-class classification - ์˜ˆ์ œ: ์ŠคํŒธ ๋ฉ”์ผ ๋ถ„๋ฅ˜, ์˜ํ™” ๋ฆฌ๋ทฐ ๋ถ„๋ฅ˜ (๊ฐ์„ฑ ๋ถ„์„), ์˜๋„ ๋ถ„์„ Word embedding - ๋‹จ์–ด๋ฅผ Dense vector๋กœ ๋ฐ”๊ฟ”์คŒ - keras์˜ Embedding(): '๋‹จ์–ด ๊ฐ๊ฐ์— ์ •์ˆ˜๊ฐ€ ๋งคํ•‘๋˜์–ด ์žˆ๋Š” ์ž…๋ ฅ'์— ์ž„๋ฒ ๋”ฉ ์ž‘์—…์„ ์ˆ˜ํ–‰ - 8~9 ์ฑ•ํ„ฐ ์ฐธ๊ณ  Word indexing - ๋‹จ์–ด๋ฅผ ๋นˆ๋„ ์ˆ˜ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•˜๊ณ  ์ˆœ์ฐจ์ ์œผ๋กœ ์ธ๋ฑ์Šค๋ฅผ ๋ถ€์—ฌ - ๋นˆ๋„ ์ˆ˜๊ฐ€ ์ ์€ ๋‹จ์–ด๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Œ - ๋กœ์ดํ„ฐ ๋‰ด์Šค ๋ถ„๋ฅ˜์™€ IMDB ๋ฆฌ๋ทฐ ๊ฐ์„ฑ ๋ถ„๋ฅ˜๋„ ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉ (์ด๋ฏธ ์ด ์ž‘์—…์ด ๋˜์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉ) - 2 ์ฑ•ํ„ฐ ์ฐธ๊ณ  RNN model.add(SimpleRNN(hidde.. 2020. 2. 27.
Kaggle ํƒ€์ดํƒ€๋‹‰ ์˜ˆ์ œ Kaggle - ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ๊ฒฝ์ง„๋Œ€ํšŒ ์›น์‚ฌ์ดํŠธ - Datasets ํ™œ์šฉ ๊ฐ€๋Šฅ - ์˜จ๋ผ์ธ IDE ("Kernel") ์„ ๋ฌด๋ฃŒ๋กœ ์ œ๊ณตํ•˜๊ณ , ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค ํŒจํ‚ค์ง€๊ฐ€ ์ด๋ฏธ ์„ค์น˜๋˜์–ด ์žˆ์Œ - ์˜ˆ์ธก ๋ชจ๋ธ๋ง์„ ํ•˜์—ฌ ์ œ์ถœํ•˜๋ฉด ๋ฆฌ๋”๋ณด๋“œ์—์„œ ์ˆœ์œ„ ํ™•์ธ ๊ฐ€๋Šฅ ํƒ€์ดํƒ€๋‹‰ ์˜ˆ์ œ ์˜์ƒ [1/3] - ๋ฌธ์ œ ๋ถ„์„ + ๋ฐ์ดํ„ฐ ๋ถ„์„ - ๊ฐ„๋‹จํ•œ ๋ฌธ์ œ ๋ถ„์„: ํƒ€์ดํƒ€๋‹‰ ํƒ‘์Šน์ž ์ค‘ ์–ด๋–ค ์‚ฌ๋žŒ์ด ์‚ด์•˜๊ณ  ์–ด๋–ค ์‚ฌ๋žŒ์ด ์ฃฝ์—ˆ๋Š”์ง€ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ - ๊ฐ Feature๊ฐ€ Target(= Survived/Dead)์™€ ์–ด๋–ค ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š”์ง€ ๊ทธ๋ž˜ํ”„๋กœ ํ™•์ธ - Feature engineering ๋‹จ๊ณ„์˜ ๋ฐฉํ–ฅ์„ฑ์„ ์žก์Œ ํƒ€์ดํƒ€๋‹‰ ์˜ˆ์ œ ์˜์ƒ [2/3] - Feature engineering - ํ…์ŠคํŠธ ๊ฐ’์„ ์ˆซ์ž๋กœ ๋ฐ”๊พธ๊ณ , ๋น ์ง„ ๊ฐ’๋“ค์„ ์ฑ„์›Œ ๋„ฃ์–ด์ฃผ๋ฉด์„œ ๋ฒกํ„ฐํ™” ํ•˜๋Š” ๊ฒƒ .. 2020. 2. 25.