CS224N-NLP Assignment individual solution
Assignment 1:
Q1:Softmax:
1.
Prove:$softmax(x)=softmax(x+c)$,x is a vector and c is a constant
begin:
Let $x=(x_1,x_2,…,x_n)$
and then we know that $softmax(x)=(\frac{e^{x_1}}{\sum_j e^{x_j}},\frac{e^{x_2}}{\sum_j e^{x_j}},…,\frac{e^{x_n}}{\sum_j e^{x_j}})$
and $softmax(x+c)=(\frac{e^{x_1+c}}{\sum_j e^{x_j+c}},\frac{e^{x_2+c}}{\sum_j e^{x_j}},…,\frac{e^{x_n+c}}{\sum_j e^{x_j+c}})=\\=(\frac{e^{x_1}e^c}{e^c\sum_j e^{x_j}},\frac{e^{x_2}e^c}{e^c\sum_j e^{x_j}},…,\frac{e^{x_n}e^c}{e^c\sum_j e^{x_j}})=(\frac{e^{x_1}}{\sum_j e^{x_j}},\frac{e^{x_2}}{\sum_j e^{x_j}},…,\frac{e^{x_n}}{\sum_j e^{x_j}})=softmax(x)$
Then this question is solved
2.softmax
import numpy as np
def softmax_inrow(m):
# print(m)
rm=np.max(m,axis=1)
rm_r=rm.reshape(rm.shape[0],1)
#print(rm_r)
m1=m-rm_r
e1=np.exp(m1)
sum=np.sum(e1,axis=1)
sum=sum.reshape(sum.shape[0],1)
e1=e1/sum
#print(e1)
return e1
def softmax_incoloum(m):
# print(m)
rm = np.max(m, axis=0)
#rm_r = rm.reshape(rm.shape[0], 1)
#print(rm)
m1 = m - rm
#print(m1)
e1 = np.exp(m1)
sum = np.sum(e1, axis=0)
#sum = sum.reshape(sum.shape[0], 1)
e1 = e1 / sum
# print(e1)
return e1
N=input()
D=input()
matrix=np.random.rand(int(N),int(D))
print(softmax_incoloum(matrix))
#print(softmax2(matrix))
Q2:
2.1:对sigma求导,并利用sigma函数来表示其导数
As $\sigma(x)=\frac{1}{1+e^{-x}}$
$\sigma’(x)=\frac{e^{-x}}{(1+e^{-x})^2}\\e^{-x}=\frac{1}{\sigma(x)}-1$
So,$\sigma’(x)=\sigma(x)(1-\sigma(x))$
2.2:求softmax作用于$\theta$下,ont-hot和softmax$\theta$交叉熵的梯度
solution:
We know that $CE(y,\hat y)=-\sum_iy_ilog(\hat y_i)$
and let $y=(s_1,s_2,…,s_n)$
$\theta=(d_1,d_2,…,d_n)$
and then $\frac{\partial CE}{\partial d_i}=\frac{\partial CE}{\partial s_k}\frac{\partial s_k}{\partial d_i}$
First we consider $\frac{\partial CE}{\partial s_k}$
$\frac{\partial CE}{\partial s_k}=\frac{\partial (-\sum_k y_klog(s_k))}{\partial s_k}=-\sum_k y_k\frac{1}{s_k}$
and then we consider $\frac{\partial s_k}{\partial d_i}$
there are two situations need to consider:
when $k=i$
$\frac{\partial s_k}{\partial d_i}=\frac{\partial ( \frac{e^z_j}{\sum_ke^z_k} )}{\partial s_k}=\frac{\partial s_i}{\partial d_i}=s_i-s_i^2$
when $k!=i$
$\frac{\partial s_k}{\partial d_i}=\frac{\partial ( \frac{e^z_j}{\sum_ke^z_k} )}{\partial s_k}=-s_is_j$
so $\frac{\partial CE}{\partial d_i}=\frac{\partial CE}{\partial s_k}\frac{\partial s_k}{\partial d_i}=-\frac{y_i}{s_i}s_i(1-s_i)+\sum_{i!=j}y_jy_s=s_i-y_i=s_i-1$
i.e. 
It’s say that the $softmax(\theta)$ minus 1 in element i(i is the label of right prediction),then we get the gradient