๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹/Paper Classification

Text classification (wikidocs)

by ํ–‰๋ฑ 2020. 2. 27.

Text classification

- Binary classification vs. Multi-class classification

- ์˜ˆ์ œ: ์ŠคํŒธ ๋ฉ”์ผ ๋ถ„๋ฅ˜, ์˜ํ™” ๋ฆฌ๋ทฐ ๋ถ„๋ฅ˜ (๊ฐ์„ฑ ๋ถ„์„), ์˜๋„ ๋ถ„์„

 

Word embedding

- ๋‹จ์–ด๋ฅผ Dense vector๋กœ ๋ฐ”๊ฟ”์คŒ

- keras์˜ Embedding(): '๋‹จ์–ด ๊ฐ๊ฐ์— ์ •์ˆ˜๊ฐ€ ๋งคํ•‘๋˜์–ด ์žˆ๋Š” ์ž…๋ ฅ'์— ์ž„๋ฒ ๋”ฉ ์ž‘์—…์„ ์ˆ˜ํ–‰

- 8~9 ์ฑ•ํ„ฐ ์ฐธ๊ณ 

 

Word indexing

- ๋‹จ์–ด๋ฅผ ๋นˆ๋„ ์ˆ˜ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ ฌํ•˜๊ณ  ์ˆœ์ฐจ์ ์œผ๋กœ ์ธ๋ฑ์Šค๋ฅผ ๋ถ€์—ฌ

- ๋นˆ๋„ ์ˆ˜๊ฐ€ ์ ์€ ๋‹จ์–ด๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ์Œ

- ๋กœ์ดํ„ฐ ๋‰ด์Šค ๋ถ„๋ฅ˜์™€ IMDB ๋ฆฌ๋ทฐ ๊ฐ์„ฑ ๋ถ„๋ฅ˜๋„ ์ด ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉ (์ด๋ฏธ ์ด ์ž‘์—…์ด ๋˜์–ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉ)

- 2 ์ฑ•ํ„ฐ ์ฐธ๊ณ 

 

RNN

model.add(SimpleRNN(hidden_size, input_size=(timesteps, input_dim)))

- hidden_size: ์ถœ๋ ฅ์˜ ํฌ๊ธฐ (output_dim)

- timesteps: ์‹œ์ ์˜ ์ˆ˜ = ๊ฐ ๋ฌธ์„œ์—์„œ์˜ ๋‹จ์–ด ์ˆ˜

- input_dim: ์ž…๋ ฅ์˜ ํฌ๊ธฐ = ๊ฐ ๋‹จ์–ด์˜ ๋ฒกํ„ฐ ์ฐจ์› ์ˆ˜

 

- Text classification์€ Many-to-One ๋ฌธ์ œ

- Binary classification: Sigmoid / binary_crossentropy

- Multi-class classification: Softmax / categorical_crossentropy

 

 

References

https://wikidocs.net/22891

https://wikidocs.net/24873

 

๋Œ“๊ธ€