Operating System Concepts

Posted on 2019-12-26
Symbols count in article: 5.1k Reading time ≈ 5 mins.

Operating System Concepts

[TOC]

Chapter 1 Introduction(介绍)

1.1 What OS Do?

A computer system can be divided roughly into four components: the hardware, the operating system, the application programs,and a user

1.1.1 User View

Some Computer have little or no user view,such as embedded computers(嵌入式计算机)

1.1.2 System View

we can view OS as a resource allocator

1.1.3 Defining OS

Note:Automata Theory Languages,and Computation(形式语言与自动机)

Posted on 2019-12-26 In 编译技术 , 计算理论
Symbols count in article: 20k Reading time ≈ 18 mins.

Chapter 1:Automata:The methods and the Madness

1.1 Why study Automata Theory

It’s conventional to designate accepting states by a double circle(haven’t show in pic above)

decidable:problems that can be solved by computer
tractable(难解)

1.2 Introduction to Formal Pr

oof(形式证明)
Formal Proof:
Statements:
①If-Then:
A->B
②If-And-Only-If:
A<->B

1.3 Additional Form of Proof(其他形式的证明方法)

略

1.4 Inductive Proofs(归纳证明)

略

1.5 The Central Concepts of Automata Theory(自动机的中心概念)

NLP-Word2vec 算法之Skip-Gram

Posted on 2019-12-26
Symbols count in article: 4.3k Reading time ≈ 4 mins.

NLP-Word2vec 算法之Skip-Gram

转载于https://www.jianshu.com/p/da235893e4a5

[TOC]

Word2Vec模型中，主要有Skip-Gram和CBOW两种模型，从直观上理解，Skip-Gram是给定input word来预测上下文。而CBOW是给定上下文，来预测input word。本篇文章仅讲解Skip-Gram模型。

模型

Word2Vec模型实际上分为了两个部分，第一部分为建立模型，第二部分是通过模型获取嵌入词向量。Word2Vec的整个建模过程实际上与自编码器（auto-encoder）的思想很相似，即先基于训练数据构建一个神经网络，当这个模型训练好以后，我们并不会用这个训练好的模型处理新的任务，我们真正需要的是这个模型通过训练数据所学得的参数，例如隐层的权重矩阵——后面我们将会看到这些权重在Word2Vec中实际上就是我们试图去学习的“word vectors”。基于训练数据建模的过程，我们给它一个名字叫“Fake Task”，意味着建模并不是我们最终的目的。

上面提到的这种方法实际上会在无监督特征学习（unsupervised feature learning）中见到，最常见的就是自编码器（auto-encoder）：通过在隐层将输入进行编码压缩，继而在输出层将数据解码恢复初始状态，训练完成后，我们会将输出层“砍掉”，仅保留隐层。

再次提一下什么是word embedding（词嵌入），一般的机器学习模型包括神经网络的输入都是数值型的，所以如果要处理词，首先需要将词转化成数值型的向量。最初的想法是构建词汇表V，然后将每一个词转化成一个1*V的一个one-hot向量，one-hot向量中文翻译就是独热，也叫独热编码，就是向量中只有一个1，其他元素都为0。但是很快研究者发现这样的表示不仅得到了大量的非常稀疏的向量或者矩阵，并且丢失了很多信息。1957年，Firth提出了一个词的意思是由他所处的环境或者说他周围的词所提供，也就是我们通常说的上下文。于是共现矩阵的理论就被提出了，共现矩阵的每个元素Xij便是单词i和单词j共同出现的次数，关于共同出现我们可以定义一个距离，当距离小于一直阈值可以判断为共同出现。

The Fake Task

我们在上面提到，训练模型的真正目的是获得模型基于训练数据学得的隐层权重。为了得到这些权重，我们首先要构建一个完整的神经网络作为我们的“Fake Task”，后面再返回来看通过“Fake Task”我们如何间接地得到这些词向量。
接下来我们来看看如何训练神经网络。假如我们有一个句子“The dog barked at the mailman”。

NLP

Posted on 2019-12-26
Symbols count in article: 4.3k Reading time ≈ 4 mins.

NLP

[TOC]

1.Introduction

Wordnet:
日语Wordnet使用

import sys
import sqlite3
from collections import namedtuple
if __name__ == '__main__':
    conn = sqlite3.connect("wnjpn.db")

    cur = conn.execute("select name from sqlite_master where type='table'")  # 获取数据库存在的表格
    for row in cur:
       print(row)

    cur = conn.execute("select * from word where lang='jpn' limit 10")  # 获取日文单词
    # print(cur.description)
    for row in cur:
        print(row)
    cur = conn.execute("PRAGMA TABLE_INFO(word)")  # 查看word表的列名
    for row in cur:
        print(row)



# 类似语检索
def SearchSimilarWords(word):

    # 問い合わせしたい単語がWordnetに存在するか確認する
    cur = conn.execute("select wordid from word where lemma='%s'" % word) #sql搜索指定词

    word_id = 99999999  #temp 如果检索到相应单词的话会改变为一个正常的较小的值
    for row in cur:  #遍历搜索结果
        word_id = row[0]


    # Wordnetに存在する語であるかの判定
    if word_id==99999999:
        print("「%s」は、Wordnetに存在しない単語です。" % word)
        return
    else:
        print("【「%s」の類似語を出力します】\n" % word)

    # 入力された単語を含む概念を検索する
    cur = conn.execute("select synset from sense where wordid='%s'" % word_id)        #sense表包含着单词id和synset id
    synsets = []
    #遍历对sense的搜索结果，加入同义词集数组
    for row in cur:
        synsets.append(row[0])


    # 概念に含まれる単語を検索して画面出力する
    no = 1
    #对每个同义词集进行遍历，输出
    for synset in synsets:
        cur1 = conn.execute("select name from synset where synset='%s'" % synset)
        for row1 in cur1:
            print("%sつめの概念 : %s" %(no, row1[0]))
        cur2 = conn.execute("select def from synset_def where (synset='%s' and lang='jpn')" % synset)
        sub_no = 1
        for row2 in cur2:
            print("意味%s : %s" %(sub_no, row2[0]))
            sub_no += 1
        cur3 = conn.execute("select wordid from sense where (synset='%s' and wordid!=%s)" % (synset,word_id))
        sub_no = 1
        for row3 in cur3:
            target_word_id = row3[0]
            cur3_1 = conn.execute("select lemma from word where wordid=%s" % target_word_id)
            for row3_1 in cur3_1:
                print("類義語%s : %s" % (sub_no, row3_1[0]))
                sub_no += 1
        print("\n")
        no += 1

2.Word2Vec

WordNet的一些不足（如无法判断相似性等），引入了词向量
One-hot:热独码。仅有一个1的向量
在One-hot编码下。词汇独立。点乘为0

Distribution similarity(分布相似性)
靠上下文来表示一个词的含义

于是使用一个密集向量来表示一个词。可以预测文本其他词汇

L2正则化

Posted on 2019-12-26
Symbols count in article: 1.8k Reading time ≈ 2 mins.

【统计学习】L0\L1\L2正则化

参考:
[1]https://www.cnblogs.com/Peyton-Li/p/7607858.html
[2]http://blog.csdn.net/zouxy09/article/details/24971995

0.简介

其中第一项是损失函数的计算。也就是真实值与估计值的误差。后面的$\Omega$就是正则项。
我们训练希望让真实值与估计值的误差尽可能的小。也就是模型尽可能拟合训练数据。但是，我们不仅希望模型训练误差小，测试误差也小（也就是泛化能力强），而复杂的模型可能会产生过拟合（也就是过度拟合训练数据，但是整体数据集上不能很好的拟合），于是我们加上第二项，也就是正则化项（regularizer）或者惩罚项（penalty term），这是一个约束项，目的是去约束模型，让其变得简单。

规格化函数$Omega$有很多种，常见的有零范数、一范数、二范数、迹范数、Frobenius范数和核范数等等

一、L0范数与L1范数

L0范数是指向量中的非零个数，如果用L0来规格化一个参数矩阵的话就是希望W的大部分元素都为0。，也就是让W的参数是稀疏的。

但是会发现。各类文章中提到的更多是L1范数。为什么大家都用L1而不是L0呢？
这也是这节的题目把L0和L1放在一起的原因，因为他们有着某种不寻常的关系。
L1范数是指向量中各个元素绝对值之和，也叫“稀疏规则算子”（Lasso regularization），同L0一样，都能使得参数矩阵稀疏化。

Q：为什么L1会使W变得稀疏呢？
因为L1范数是L0范数的最优凸近似，任何的规则化算子，如果他在Wi=0的地方不可微，并且可以分解为一个“求和”的形式，那么这个规则化算子就可以实现稀疏

W的L1范数是绝对值，|w|在w=0处不可微

Framework

Posted on 2019-12-26
Symbols count in article: 1.2k Reading time ≈ 1 mins.

A#Optimization Algorithm

SGD -> SGDM -> NAG ->AdaGrad -> AdaDelta -> Adam -> Nadam

Framework

Before we begin, we should define some definition:

+ **The parameters wait to be optimized:** $\theta$ + **target function:** $f(\theta)$ + **Gradient:** $g_t=\nabla f(\theta)$ + **First order momentum:** $m_t$ + **Second order momentum:** $V_t$ + **calculate current Gradient:**$\eta_t=\alpha \cdot\frac{m}{\sqrt{V_t}}$ + **Update:** $\theta_{t+1}=\theta_{t}-\eta_t$

SGD

the most simple optimization algorithm

Update:
Because SGD doesn’t use momentum, so
$m_t=g_t,V_t=I^2$

SGDM(SGD with Momentum)

$m_t=\beta_1\cdot m_{t-1}+(1-\beta_1)\cdot g_t$

DNS Working Principle

Posted on 2019-12-26
Symbols count in article: 1.2k Reading time ≈ 1 mins.

DNS Working Principle

What’s is DNS

　DNS(Domain Name System)，Domain System，which is a service of Internet.It a kind of distributed database that mapping between Domain Name and IP Address.So that people can access the Internet more convenient.DNS use TCP and UDP’s port 53.Currently,the limit of each grade domain is not over 63 characters,And the total length of domain can not over 253 characters.

它是因特网的一项服务。它作为将域名和IP地址相互映射的一个分布式数据库，能够使人更方便地访问互联网。DNS使用TCP和UDP端口53。当前，对于每一级域名长度的限制是63个字符，域名总长度则不能超过253个字符。

DNS Working Principle

　　当我们请求一个域名时，直到获取到IP地址，整个过程是如何工作的？以请求www.codecc.xyz为例：

　　1、首先，我们的主机会去查找本地的hosts文件和本地DNS解析器缓存，如果hosts文件和本地DNS缓存存在www.codecc.xyz和IP的映射关系，则完成域名解析，请求该IP地址，否则进入第二步。

　　2、当hosts和本地DNS解析器缓存都没有对应的网址映射关系，则会根据机器(/etc/reslove.conf)配置的本地DNS服务器进行查询，此服务器收到查询时，如果要查询的域名在本地配置区域资源或者缓存中存在映射关系，则跳到步骤9，将解析结果直接返回给客户机。

　　PS：一二步骤为递归查询，其余步骤为迭代查询

Deep Learning

Posted on 2019-12-26
Symbols count in article: 612 Reading time ≈ 1 mins.

Deep Learning

[TOC]

Pre:Notation

$e^{(i)}$:Standard basis vector [0,0,0,…,1,0,0,0] with a 1 at position i
A:Matrix
A:tensor (a tensor is some numbers extend in n dimension,which called “n dimension tensor”.For example,scalar is 1 dimension tensor and matrix is 2 dimension tensor)

(某节点的父节点)

Element-wise (Hadamard) product:矩阵点乘

Chapter 1:Introduction

representation learning
factors of variation
MLP:multilayer perceptron
Logistic Regression

Chapter 1:Linear Algebra

Hadamard product:$A\odot B$
norm(范数)

Database

Posted on 2019-12-26
Symbols count in article: 904 Reading time ≈ 1 mins.

Database

范式分解

3NF:

求最小函数依赖F
不在F中的属性单独拿出来做一个关系
其他的左右合起来就行

BCNF

求最新函数依赖F
求候选键
分析F中每个函数依赖
如果不符合BC范式（左边存在非候选键的）
分解成2个关系
左右合并作为一个分解。然后第二个关系要去掉第一个关系右边项

判断无损分解

画表。
一次分析每个依赖里面 a__i的。然后把对应右边关系的行，干成a_k.某行全a后就是无损分解

事务

ACID特性(原子、一致、隔离、持久性)

CS224N-NLP Assignment individual solution

Posted on 2019-12-26
Symbols count in article: 10k Reading time ≈ 9 mins.

CS224N-NLP Assignment individual solution

Assignment 1:

Q1:Softmax:
1.

Prove:$softmax(x)=softmax(x+c)$,x is a vector and c is a constant
begin:
Let $x=(x_1,x_2,…,x_n)$
and then we know that $softmax(x)=(\frac{e^{x_1}}{\sum_j e^{x_j}},\frac{e^{x_2}}{\sum_j e^{x_j}},…,\frac{e^{x_n}}{\sum_j e^{x_j}})$
and $softmax(x+c)=(\frac{e^{x_1+c}}{\sum_j e^{x_j+c}},\frac{e^{x_2+c}}{\sum_j e^{x_j}},…,\frac{e^{x_n+c}}{\sum_j e^{x_j+c}})=\\=(\frac{e^{x_1}e^c}{e^c\sum_j e^{x_j}},\frac{e^{x_2}e^c}{e^c\sum_j e^{x_j}},…,\frac{e^{x_n}e^c}{e^c\sum_j e^{x_j}})=(\frac{e^{x_1}}{\sum_j e^{x_j}},\frac{e^{x_2}}{\sum_j e^{x_j}},…,\frac{e^{x_n}}{\sum_j e^{x_j}})=softmax(x)$
Then this question is solved

2.softmax

import numpy as np

def softmax_inrow(m):
   # print(m)
    rm=np.max(m,axis=1)
    rm_r=rm.reshape(rm.shape[0],1)
    #print(rm_r)
    m1=m-rm_r
    e1=np.exp(m1)
    sum=np.sum(e1,axis=1)
    sum=sum.reshape(sum.shape[0],1)
    e1=e1/sum
    #print(e1)
    return e1

def softmax_incoloum(m):
    # print(m)
    rm = np.max(m, axis=0)
    #rm_r = rm.reshape(rm.shape[0], 1)
    #print(rm)
    m1 = m - rm
    #print(m1)
    e1 = np.exp(m1)
    sum = np.sum(e1, axis=0)
    #sum = sum.reshape(sum.shape[0], 1)
    e1 = e1 / sum
    # print(e1)
    return e1

N=input()
D=input()
matrix=np.random.rand(int(N),int(D))
print(softmax_incoloum(matrix))
#print(softmax2(matrix))

Q2:
2.1:对sigma求导，并利用sigma函数来表示其导数

As $\sigma(x)=\frac{1}{1+e^{-x}}$
$\sigma’(x)=\frac{e^{-x}}{(1+e^{-x})^2}\\e^{-x}=\frac{1}{\sigma(x)}-1$
So,$\sigma’(x)=\sigma(x)(1-\sigma(x))$

2.2:求softmax作用于$\theta$下,ont-hot和softmax$\theta$交叉熵的梯度

solution:
We know that $CE(y,\hat y)=-\sum_iy_ilog(\hat y_i)$
and let $y=(s_1,s_2,…,s_n)$
$\theta=(d_1,d_2,…,d_n)$
and then $\frac{\partial CE}{\partial d_i}=\frac{\partial CE}{\partial s_k}\frac{\partial s_k}{\partial d_i}$
First we consider $\frac{\partial CE}{\partial s_k}$
$\frac{\partial CE}{\partial s_k}=\frac{\partial (-\sum_k y_klog(s_k))}{\partial s_k}=-\sum_k y_k\frac{1}{s_k}$
and then we consider $\frac{\partial s_k}{\partial d_i}$
there are two situations need to consider:
when $k=i$
$\frac{\partial s_k}{\partial d_i}=\frac{\partial ( \frac{e^z_j}{\sum_ke^z_k} )}{\partial s_k}=\frac{\partial s_i}{\partial d_i}=s_i-s_i^2$
when $k!=i$
$\frac{\partial s_k}{\partial d_i}=\frac{\partial ( \frac{e^z_j}{\sum_ke^z_k} )}{\partial s_k}=-s_is_j$
so $\frac{\partial CE}{\partial d_i}=\frac{\partial CE}{\partial s_k}\frac{\partial s_k}{\partial d_i}=-\frac{y_i}{s_i}s_i(1-s_i)+\sum_{i!=j}y_jy_s=s_i-y_i=s_i-1$
i.e.

It’s say that the $softmax(\theta)$ minus 1 in element i(i is the label of right prediction),then we get the gradient