๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๋จธ์‹ ๋Ÿฌ๋‹, ๋”ฅ๋Ÿฌ๋‹24

๊ฐ•ํ™”ํ•™์Šต ๋ณต์Šต ์ž๋ฃŒ 2: Dummy Q-learning algorithm Q-learning์˜ ๊ธฐ๋ณธ ์‹์„ ์ด๋Œ์–ด ๋‚ด๊ธฐ ์œ„ํ•ด ํ•œ '๋ฏฟ์Œ'์„ ์‚ดํŽด๋ณด์ž. 1. ๋จผ์ € ๋‚˜๋Š” s์— ์žˆ๊ณ  2. action a๋ฅผ ์ทจํ•˜๋ฉด s'์œผ๋กœ ์ด๋™ํ•˜๋ฉฐ reward r์„ ๋ฐ›๋Š”๋‹ค. ์—ฌ๊ธฐ์„œ, s'์— Q๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ฏฟ์ž. s'์— Q๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ฏฟ์ž๋Š” ๊ฒƒ์˜ ์˜๋ฏธ๋Š” ์•„๋งˆ๋„ (s์—์„œ a๋ฅผ ์ทจํ•ด ๋ณ€ํ•œ state) s'์—์„œ ์–ด๋–ค action์„ ์ทจํ•ด์„œ ๋ฐ›์„ reward๋ฅผ ์•Œ๊ณ  ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž๋Š” ๊ฒƒ์ธ ๊ฒƒ ๊ฐ™๋‹ค. (ํŠน์ • action์ด๋ผ๊ธฐ ๋ณด๋‹ค ๊ทธ ์–ด๋–ค action์— ๋Œ€ํ•œ reward๋ผ๋„) ์ด์ œ Q(s, a)๋ฅผ Q(s', a')์„ ์ด์šฉํ•ด ๋‚˜ํƒ€๋‚ด๋ณด๋ฉด, Q(s, a) = r + max(a') Q(s', a') r์€ s์—์„œ a๋ฅผ ์ทจํ•ด ์ฆ‰๊ฐ์ ์œผ๋กœ ์–ป์€ reward์ด๋ฉฐ max(a') Q(s', a')์€ ๊ทธ ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์–ป์„ ์ตœ๋Œ€ rewa.. 2022. 3. 12.
๊ฐ•ํ™”ํ•™์Šต ๋ณต์Šต ์ž๋ฃŒ 1: Concept of RL ๊ฐ•ํ™”ํ•™์Šต์€ ๊ธฐ๋ณธ์ ์œผ๋กœ, ์ฅ(Actor ํ˜น์€ Agent)๊ฐ€ action์„ ์ทจํ•˜๋ฉด ๊ทธ์— ๋”ฐ๋ฅธ reward๋ฅผ ๋ฐ›๊ณ , ๋ณ€ํ™”๋œ state๋ฅผ ๊ด€์ฐฐํ•˜์—ฌ ๋‹ค์‹œ action์„ ์ทจํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. Q ํ•จ์ˆ˜์— state์™€ action์„ ์ฃผ๋ฉด, ๊ทธ์— ๋”ฐ๋ฅธ reward๋ฅผ ๋ฆฌํ„ดํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์ž. (๊ทธ๋Ÿฌํ•œ Q ํ˜•๋‹˜์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž.) Agent๊ฐ€ ์•Œ๊ณ  ์‹ถ์€ ๊ฒƒ์€ ์ตœ๋Œ€ reward๋ฅผ ๋งŒ๋“œ๋Š” action์ด๋‹ค. ์ด ๋‚ด์šฉ์„ ๋‹ค์Œ ๋‘ ์ˆ˜ํ•™์  ํ‘œ๊ธฐ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. max(a') Q(s, a') : s ๋ผ๋Š” state์— a'์„ ๋ฐ”๊พธ์–ด ์คŒ์œผ๋กœ์จ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ตœ๋Œ€ reward๊ฐ’ (Q๊ฐ’) argmax(a') Q(s, a'): (์œ„์™€ ์ด์–ด์ง€๋Š” ์ƒํ™ฉ์—์„œ) ์ตœ๋Œ€ Q๊ฐ’์„ ๊ฐ–๊ฒŒํ•˜๋Š” argument a' Reference: Sung Kim ๋ชจ๋‘๋ฅผ ์œ„ํ•œ R.. 2022. 3. 12.
4/23 ํ•ด๋ดค๋˜ ๊ฒƒ ์ •๋ฆฌ 1. RMDL ์ ์šฉ ์‹œ๋„ Colab์— pip ์„ค์น˜ํ•ด ํ•˜๋‹ค๊ฐ€ checkpoint ๊ฒฝ๋กœ ๋ฐ”๊ฟ”๋ณด๋ ค๊ณ  ๋‚ด ๊นƒํ—™์— forkํ•˜๊ณ  ๊ฒฝ๋กœ ๊ณ ์ณ์„œ ์ปค๋ฐ‹ํ•œ ๋‹ค์Œ git cloneํ•จ (pip ์„ค์น˜ํ•˜๊ณ ๋„ ์ฝ”๋“œ๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” ์ง€๋Š” ๋ชจ๋ฅด๊ฒ ์Œ..) https://stackoverflow.com/questions/49322072/checkpoints-in-google-colab : ์ด๊ฑฐ ๋ณด๋ฉด checkpoint ๊ฒฝ๋กœ๋ฅผ /gdrive ๋‚ด๋กœ ๋ฐ”๊ฟ”๋„ ์•ˆ ๋  ์ˆ˜๋„ ์žˆ์„ ๊ฒƒ ๊ฐ™์Œ (๋ฌผ๋ก  mount ํ›„์—..) https://research.google.com/colaboratory/local-runtimes.html : Colab ๋กœ์ปฌ ๋Ÿฐํƒ€์ž„ ๊ด€๋ จ document. ๋กœ์ปฌ์—์„œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•˜๊ณ  ๋กœ์ปฌ ํŒŒ์ผ ์‹œ์Šคํ…œ์— ์—‘์„ธ์Šคํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•จ. ๋‹จ์ .. 2020. 4. 24.
NLP Encoding (wikidocs) Integer encoding - ๋‹จ์–ด ์ง‘ํ•ฉ(vocab)์˜ ๋‹จ์–ด์— ๊ณ ์œ ํ•œ ์ˆซ์ž๋ฅผ ๋ถ€์—ฌ - ๋ฐฉ๋ฒ•: python dictionary, Counter, NLTK FreqDist, Keras preprocessing.text 1) sentence/word tokenization, cleaning, normalization 2) key=๋‹จ์–ด, value=๋นˆ๋„์ˆ˜๋กœ ํ•˜์—ฌ ๋‹จ์–ด ์ง‘ํ•ฉ์„ ๋งŒ๋“ค๊ณ  ๋นˆ๋„์ˆ˜ ์ˆœ์œผ๋กœ ์ •๋ ฌ 3) ๋นˆ๋„์ˆ˜๊ฐ€ ๋†’์€ ๋‹จ์–ด๋ถ€ํ„ฐ ๋‚ฎ์€ ์ •์ˆ˜ ์ธ๋ฑ์Šค๋ฅผ ๋ถ€์—ฌ 4) ๋นˆ๋„์ˆ˜๊ฐ€ ์ ์€ ๋‹จ์–ด๋ฅผ ๋‹จ์–ด ์ง‘ํ•ฉ์—์„œ ์ œ์™ธํ•  ์ˆ˜ ์žˆ์Œ 5) ์ž์—ฐ์–ด ์ƒํƒœ์˜ ๋‹จ์–ด๋ฅผ ์ •์ˆ˜ ์ธ๋ฑ์Šค๋กœ ๋ณ€ํ™˜ - ์ •์ˆ˜ ์ธ๋ฑ์Šค๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •์—์„œ OOV๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ์Œ - OOV(Out-Of-Vocabulary): ๋‹จ์–ด ์ง‘ํ•ฉ์— ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด (๋นˆ๋„์ˆ˜๊ฐ€ ์ ์–ด .. 2020. 3. 19.