[Machine learning][Part4] 多维矩阵下的梯度下降线性预测模型的实现

news/2024/6/17 0:37:05 标签: 机器学习, 矩阵, 人工智能

目录

模型初始化信息:

模型实现:

多变量损失函数:

多变量梯度下降实现:

多变量梯度实现:

多变量梯度下降实现:


之前部分实现的梯度下降线性预测模型中的training example只有一个特征属性:房屋面积,这显然是不符合实际情况的,这里增加特征属性的数量再实现一次梯度下降线性预测模型。

这里回顾一下梯度下降线性模型的实现方法:

  1. 实现线性模型:f = w*x + b,模型参数w,b待定
  2. 寻找最优的w,b组合:

             (1)引入衡量模型优劣的cost function:J(w,b) ——损失函数或者代价函数

             (2)损失函数值最小的时候,模型最接近实际情况:通过梯度下降法来寻找最优w,b组合

模型初始化信息:

  • 新的房子的特征有:房子面积、卧室数、楼层数、房龄共4个特征属性。
Size (sqft)Number of BedroomsNumber of floorsAge of HomePrice (1000s dollars)
21045145460
14163240232
852213517

 上面表中的训练样本有3个,输入特征矩阵模型为:

具体代码实现为,X_train是输入矩阵,y_train是输出矩阵

X_train = np.array([[2104, 5, 1, 45], 
                    [1416, 3, 2, 40],
                    [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

模型参数w,b矩阵

代码实现:w中的每一个元素对应房屋的一个特征属性

b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

模型实现:

def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b     
    return p   

多变量损失函数:

J(w,b)为:

代码实现为:

def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

多变量梯度下降实现:

多变量梯度实现:

def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

多变量梯度下降实现:

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

梯度下降算法测试:

# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")


# plot cost versus iteration  
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
ax1.plot(J_hist)
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
ax1.set_title("Cost vs. iteration");  ax2.set_title("Cost vs. iteration (tail)")
ax1.set_ylabel('Cost')             ;  ax2.set_ylabel('Cost') 
ax1.set_xlabel('iteration step')   ;  ax2.set_xlabel('iteration step') 
plt.show()

结果为:

可以看到,右图中损失函数在traning次数结束之后还一直在下降,没有找到最佳的w,b组合。具体解决方法,后面会有更新。

完整的代码为:

import copy, math
import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])


def predict(x, w, b):
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters
      b (scalar):             model parameter

    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b
    return p


def compute_cost(X, y, w, b):
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters
      b (scalar)       : model parameter

    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b  # (n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i]) ** 2  # scalar
    cost = cost / (2 * m)  # scalar
    return cost


def compute_gradient(X, y, w, b):
    """
    Computes the gradient for linear regression
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters
      b (scalar)       : model parameter

    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b.
    """
    m, n = X.shape  # (number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):
        err = (np.dot(X[i], w) + b) - y[i]
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err * X[i, j]
        dj_db = dj_db + err
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    return dj_db, dj_dw


def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
    """
    Performs batch gradient descent to learn theta. Updates theta by taking
    num_iters gradient steps with learning rate alpha

    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent

    Returns:
      w (ndarray (n,)) : Updated values of parameters
      b (scalar)       : Updated value of parameter
      """

    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  # avoid modifying global w within function
    b = b_in

    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db, dj_dw = gradient_function(X, y, w, b)  ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw  ##None
        b = b - alpha * dj_db  ##None

        # Save cost J at each iteration
        if i < 100000:  # prevent resource exhaustion
            J_history.append(cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i % math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")

    return w, b, J_history  # return final w,b and J history for graphing

# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient,
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

# plot cost versus iteration
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
ax1.plot(J_hist)
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
ax1.set_title("Cost vs. iteration");  ax2.set_title("Cost vs. iteration (tail)")
ax1.set_ylabel('Cost')             ;  ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step')   ;  ax2.set_xlabel('iteration step')
plt.show()


http://www.niftyadmin.cn/n/5078203.html

相关文章

《机器学习》- 第3章 线性模型

文章目录 3.1 基本形式3.2 线性回归一元线性回归多元线性回归对数模型 3.3 对数几率回归&#xff08;逻辑回归&#xff09;问题描述二分类任务 3.4 线性判别分析3.5 多分类学习3.6 类别不平衡问题 3.1 基本形式 问题描述&#xff1a; 函数形式&#xff1a; 向量形式&#xf…

基于工业5G网关的工业机器人监测控制方案

随着智能制造、自动化生产的发展进步&#xff0c;工业机器人的身影越来越多地出现在工厂现场&#xff0c;成为新型无人化、智能化生产制造的中坚力量。 工业机器人的运行伴生着海量的数据采集、传输、分析和反馈执行&#xff0c;因此也需要高速、低延迟的5G网络&#xff0c;支撑…

YOLOv7改进策略:SCConv空间和通道重建卷积,即插即用,助力检测 | CVPR2023 SCConv

💡💡💡本文改进:SCConv(空间和通道重建卷积),一个即插即用的架构单元,可以可以直接用来替代各种卷积神经网络中的标准卷积。 SCConv | 亲测在多个数据集能够实现涨点 收录: YOLOv7高阶自研专栏介绍: http://t.csdnimg.cn/tYI0c ✨✨✨前沿最新计算机顶会复现…

适用于嵌入式单片机的差分升级通用库

转至&#xff1a;痞子衡嵌入式半月刊&#xff1a;第 81 期 1、mcu_bsdiff_upgrade - 适用于嵌入式单片机的差分升级通用库 mcu_bsdiff_upgrade 是一款适用于嵌入式单片机的差分升级库&#xff0c;通用所有单片机&#xff0c;如stm32、华大、复旦微、瑞萨等。适合嵌入式的差分升…

爬虫基础知识点快速入门

以下是一个包含注释的Python示例&#xff0c;演示了基本的网页爬取过程&#xff0c;以及一些常见的爬虫知识点&#xff1a; # 导入必要的库 import requests # 用于发送HTTP请求 from bs4 import BeautifulSoup # 用于解析HTML import csv # 用于数据存储# 1. 指定目标网站…

practical on mifare

抽象的。 mifare Classic 是市场上使用最广泛的非接触式智能卡。 其设计和实施细节由制造商保密。 本文研究了该卡的体系结构以及卡与读卡器之间的通信协议。 然后&#xff0c;它提供了一种实用的、低成本的攻击&#xff0c;可以从卡的内存中恢复秘密信息。 由于伪随机生成器的…

linux_三剑客(grep,sed,awk)

前言&#xff1a; Bash:是一个命令处理器&#xff0c;运行在文本窗口中&#xff0c;并能执行用户直接输入的命令。 Bash还能从文件中读取Linux命令&#xff0c;称之为脚本。 Bash支持通配符&#xff0c;管道&#xff0c;命令替换&#xff0c;条件判断等逻辑控制语句。 快捷…

数学术语之源——单射(injection),满射(surjection),双射(bijection)

1. 单射或入射(injection) 1.1 injection的词源 词义为“a forcing of a fluid into a body (with a syringe, etc.)(迫使液体进入体内(使用注射器等))”(始于15世纪早期)&#xff0c;来自古法语“iniection ”(14世纪)或者直接来自拉词语“iniectionem (主格‘iniectio’)”&…