简介:
梯度下降法(Gradient Descent)严格地说其实不能算是一种机器学习的算法,而是属于一种优化算法,其目的在于基于搜索最小化一个损失函数,找到最优解。由于其简单又具有很好地效果,因此被广泛地运用在机器学习的算法中。
原理讲解:
假设在二维空间平面上,有函数如下:
因此我们要寻找到最低点,我们可以定义一个公式:y=y-ηy'(y'为y的导数)。当我们重复代入上述公式时,我们可以发现随着点逐渐下降,y'的值越来越小,直到寻找到最低点时y'的值为0,y值不再改变,这时我们就找到了最低点。
其中η我们称之为"学习效率"。不难发现η越小,我们代入公式的次数就越多,得到结果的速度也就越慢。但是η的值也不能取得太大,否则可能会出现第一次就越过了最低点,然后逐渐愈加偏离的情况:
梯度下降法不仅仅可以应用在二维的数据,多维的数据也同样适用,推广到多维时可如下图:
应用在多维事,我们可以运用之前学过的多元线性回归算法来帮助我们计算结果,并将线性回归得到的式子转换成矩阵的形式,我们就可以通过代码实现上述的算法了。
这里再拓展一种随机梯度下降法(上述方法我们通常成为批量梯度下降法),目的是让提高计算速度。因为上面的方法我们每次下降都需要计算所有的数据,这样子就需要大量的计算时间,而通过实验发现,即使是随机取其中一组数据进行计算,最终我们还是可以到达最低点的附近(这里类似于用精度换取时间)。
好了,大概思路就是这样子,接下来我们通过python代码实现上面的算法:
代码展示:
1、封装算法
# 封装的多元线性回归的梯度下降算法import numpy as npfrom sklearn.metrics import r2_scoreclass LinearRegression: def __init__(self): self.coef_ = None self.interception_ = None self._theta = None def fit_normal(self, x_train, y_train): assert x_train.shape[0] == y_train.shape[0], \ "the size of x_train must be equal to the size of y_train" x_b = np.hstack([np.ones((len(x_train), 1)), x_train]) self._theta = np.linalg.inv(x_b.T.dot(x_b)).dot(x_b.T).dot(y_train); self.interception_ = self._theta[0] self.coef_ = self._theta[1:] def fit_gd(self, x_train, y_train, eta=0.01, n_iters=1e4): assert x_train.shape[0] == y_train.shape[0], \ "the size of x_train must be equal to the size of y_train" def lose(theta, x_b, y): try: return np.sum((y - x_b.dot(theta) ** 2)) / len(x_b) except: return float("inf") def Derivative(theta, x_b, y): return x_b.T.dot(x_b.dot(theta)-y)*2/len(x_b) def gradient_descent(x_b, y, init_theta, eta, epsilon=1e-8): theta = init_theta i_iters = 0 while i_iters < n_iters: gradient = Derivative(theta, x_b, y) last_theta = theta theta = theta - eta * gradient if (abs(lose(theta, x_b, y) - lose(last_theta, x_b, y)) < epsilon): break i_iters += 1 return theta x_b = np.hstack([np.ones((len(x_train), 1)), x_train]) initial_theta = np.zeros(x_b.shape[1]) self._theta = gradient_descent(x_b, y_train, initial_theta, eta, epsilon=1e-8) self.interception_ = self._theta[0] self.coef_ = self._theta[1:] return self def fit_random_gd(self, x_train, y_train, n_iters=5, t0=5, t1=50): assert x_train.shape[0] == y_train.shape[0], \ "the size of x_train must be equal to the size of y_train" assert n_iters >= 1 def Derivative(theta, x_b_i, y_i): return x_b_i*(x_b_i.dot(theta)-y_i)*2 def random_gradient_descent(x_b, y, initial_theta): def learning_rate(t): return t0/(t+t1) theta = initial_theta m = len(x_b) for cur_iter in range(n_iters): indexes = np.random.permutation(m) x_b_new = x_b[indexes] y_new = y[indexes] for i in range(m): gradient = Derivative(theta, x_b_new[i], y_new[i]) theta = theta - learning_rate(cur_iter*m+i)*gradient return theta x_b = np.hstack([np.ones((len(x_train), 1)), x_train]) initial_theta = np.zeros(x_b.shape[1]) self._theta = random_gradient_descent(x_b, y_train, initial_theta) self.interception_ = self._theta[0] self.coef_ = self._theta[1:] return self def predict(self, x_predict): assert x_predict.shape[1] == len(self.coef_), \ "Simple Linear regressor can only solve single feature training data" assert self.interception_ is not None and self.coef_ is not None, \ "must fit before predict!" x_b = np.hstack([np.ones((len(x_predict), 1)), x_predict]) return x_b.dot(self._theta) def score(self, x_test, y_test): y_predict = self.predict(x_test) return r2_score(y_test, y_predict) def __repr__(self): return "LinearRegression()"
2、主函数:
from sklearn import datasetsfrom sklearn.model_selection import train_test_splitfrom LR_GD_class import LinearRegressionfrom sklearn.preprocessing import StandardScalerboston = datasets.load_boston()x = boston.datay = boston.targetx = x[y < 50.0]y = y[y < 50.0]x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=666)standardScaler = StandardScaler()standardScaler.fit(x_train)x_train_stand = standardScaler.transform(x_train)x_test_stand = standardScaler.transform(x_test)# 批量梯度下降法lin_reg = LinearRegression()lin_reg.fit_gd(x_train_stand, y_train)print(lin_reg.score(x_test_stand, y_test))## 随机梯度下降法lin_reg1 = LinearRegression()lin_reg1.fit_random_gd(x_train_stand, y_train, n_iters=50)print(lin_reg1.score(x_test_stand, y_test))原文:https://juejin.cn/post/7096315783583809550