Text classification (wikidocs)

Text classification

- Binary classification vs. Multi-class classification

- 예제: 스팸 메일 분류, 영화 리뷰 분류 (감성 분석), 의도 분석

Word embedding

- 단어를 Dense vector로 바꿔줌

- keras의 Embedding(): '단어 각각에 정수가 매핑되어 있는 입력'에 임베딩 작업을 수행

- 8~9 챕터 참고

Word indexing

- 단어를 빈도 수 순서대로 정렬하고 순차적으로 인덱스를 부여

- 빈도 수가 적은 단어를 제거할 수 있음

- 로이터 뉴스 분류와 IMDB 리뷰 감성 분류도 이 방법을 사용 (이미 이 작업이 되어 있는 데이터를 사용)

- 2 챕터 참고

RNN

model.add(SimpleRNN(hidden_size, input_size=(timesteps, input_dim)))

- hidden_size: 출력의 크기 (output_dim)

- timesteps: 시점의 수 = 각 문서에서의 단어 수

- input_dim: 입력의 크기 = 각 단어의 벡터 차원 수

- Text classification은 Many-to-One 문제

- Binary classification: Sigmoid / binary_crossentropy

- Multi-class classification: Softmax / categorical_crossentropy

References

https://wikidocs.net/22891

NLP tutorial (wikidocs) (0)	2020.03.16
Research paper classification systems based on TF-IDF and LDA schemes (0)	2020.03.03
Kaggle 타이타닉 예제 (0)	2020.02.25
Google colab 사용법 (0)	2020.02.24
아이디어들 (0)	2019.11.22

IT 찢는 뱁새 🐣