Posted on Categories:CS代写, Machine Learning, 机器学习, 计算机代写

# 计算机代写|机器学习代写Machine Learning代考|COMP5328 Combining PCA with Linear Regression

avatest™

## avatest™帮您通过考试

avatest™的各个学科专家已帮了学生顺利通过达上千场考试。我们保证您快速准时完成各时长和类型的考试，包括in class、take home、online、proctor。写手整理各样的资源来或按照您学校的资料教您，创造模拟试题，提供所有的问题例子，以保证您在真实考试中取得的通过率是85%以上。如果您有即将到来的每周、季考、期中或期末考试，我们都能帮助您！

•最快12小时交付

•200+ 英语母语导师

•70分以下全额退款

## 计算机代写|机器学习代写Machine Learning代考|Combining PCA with Linear Regression

One important use case of PCA is as a pre-processing step within an overall ML problem such as linear regression (see Sect.3.1). As discussed in Chap. 7, linear regression methods are prone to overfitting whenever the data points are characterized by feature vectors whose length $D$ exceeds the number $m$ of labeled data points used for training. One simple but powerful strategy to avoid overfitting is to preprocess the original feature vectors (they are considered as the raw data points $\mathbf{z}^{(i)} \in \mathbb{R}^d$ ) by applying PCA in order to obtain smaller feature vectors $\mathbf{x}^{(i)} \in \mathbb{R}^n$ with $n<m$.

How to Choose Number of PC?
There are several aspects which can guide the choice for the number $n$ of PCs to be used as features.

for data visualization: use either $n=2$ or $n=3$

computational budget: choose $n$ sufficiently small such that the computational complexity of the overall ML method does not exceed the available computational resources.

statistical budget: consider using PCA as a pre-processing step within a linear regression problem (see Sect.3.1). Thus, we use the output $\mathbf{x}^{(i)}$ of PCA as the feature vectors in linear regression. In order to avoid overfitting, we should choose $n<m$ (see Chap. 7).

elbow method: choose $n$ large enough such that approximation error $\widehat{L}^{(\mathrm{PCA})}$ is reasonably small (see Fig. 9.2).

## 计算机代写|机器学习代写Machine Learning代考|Extensions of PCA

There have been proposed several extensions of the basic PCA method:

Kernel PCA [3, Chap. 14.5.4]: The PCA method is most effective if the raw feature vectors of data points are nearby a low-dimensional linear subspace of $\mathbb{R}^d$. Kernel PCA extends PCA to handle data points that are located near a low-dimensional manifold which might be highly non-linear. This is achieved by applying PCA to transformed feature vectors instead of the original feature vectors. Kernel PCA first applies a (typically non-linear) feature map to the original feature vectors $\mathbf{x}^{(i)}$ resulting in new feature vectors $\mathbf{z}^{(i)}$ (see Sect. 3.9). We then apply PCA to the transformed feature vectors $\mathbf{z}^{(i)}$, for $i=1, \ldots, m$.

Robust PCA [4]: In its basic form, PCA is sensitive to outliers which are a small number of data points with fundamentally different statistical properties than the bulk of data points. This sensitivity might be attributed to the properties of the squared Euclidean norm (9.3) which is used in PCA to measure the reconstruction error (9.1). We have seen in Chap. 3 that linear regression (see Sect. $3.1$ and $3.3$ ) can be made robust against outliers by replacing the squared error loss with another loss function. In a similar spirit, robust PCA replaces the squared Euclidean norm with another norm that is less sensitive to having very large reconstruction errors (9.1) for a small number of data points (which are outliers).

## 计算机代写|机器学习代写Machine Learning代考|Combining PCA with Linear Regression

PCA的一个重要用例是作为整个ML问题的预处理步骤，如线性回归（见第3.1节）。正如第7章所讨论的，只要数据点的特征向量的长度$D$超过用于训练的标记数据点的数量$m$，线性回归方法就容易出现过拟合。避免过拟合的一个简单而有力的策略是通过应用PCA对原始特征向量（它们被认为是原始数据点$mathbf{z}^{(i)}\in \mathbb{R}^d$）进行预处理，以获得较小的特征向量$mathbf{x}^{(i)}\in \mathbb{R}^n$，且$n<m$。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。