【深層学習】SMT,Seq2seq,Attention
http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture08-nmt.pdf
SMT(Statistic Machine Translation)
We want to consider $argmax_y P(y|x)$, where x,y are both sentence
It’s equate to $argmax_y P(x|y)P(y)$
Question: How to learn translation model $P(x|y)$ from the parallel corpus?
$a$ is the alignment
alignment
Learning alignment for SMT
Decoding for SMT
Neural Machine Translation (NMT) i
seq2seq
it involves two RNNs.
其实就是一个RNN(编码器)连另一个RNN(解码器)
(第二段的RNN是每个时刻的输出作为下一时刻的输入)
seq2seq除了能解决机器翻译外还能解决很多NLP任务
- Summerization (缩句)
- Dialogue
- Parsing
- Code generation
- etc.
Beam search
step
beam size=2,所以可以两边都分裂
beam size=2所以选择两条最可能路线继续分裂
Attention
seq2seq has Information bottleneck problem