【深層学習】SMT,Seq2seq,Attention

【深層学習】SMT,Seq2seq,Attention

http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture08-nmt.pdf

SMT(Statistic Machine Translation)

We want to consider $argmax_y P(y|x)$, where x,y are both sentence
It’s equate to $argmax_y P(x|y)P(y)$

Question: How to learn translation model $P(x|y)$ from the parallel corpus?
Alt text
$a$ is the alignment

alignment

Alt text
Alt text
Alt text
Alt text
Alt text
Alt text

Learning alignment for SMT

Alt text

Decoding for SMT

Alt text

Neural Machine Translation (NMT) i

seq2seq

it involves two RNNs.
Alt text
其实就是一个RNN(编码器)连另一个RNN(解码器)
(第二段的RNN是每个时刻的输出作为下一时刻的输入)

seq2seq除了能解决机器翻译外还能解决很多NLP任务

  • Summerization (缩句)
  • Dialogue
  • Parsing
  • Code generation
  • etc.

Alt text
Alt text

Alt text

step

beam size=2,所以可以两边都分裂
Alt text
beam size=2所以选择两条最可能路线继续分裂
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text

Attention

seq2seq has Information bottleneck problem
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text
Alt text

summary

Alt text
Alt text
Alt text

-------------End of this passage-------------