Posted on Categories:Business Statistics, 商业统计, 商科代写

If you had to put a number (say, between 0 and 1 ) on the strength of the linear association between house prices and sizes in Figure 4.1, what would it be? Your measure shouldn’t depend on the choice of units for the variables. Zillow could have reported the house sizes in square meters and the price in thousands of dollars, but regardless of the units, the scatterplot would look the same. When we change units, the direction, form, and strength won’t change, so neither should our measure of the association’s (linear) strength.

We saw a way to remove the units in the previous chapter. We can standardize each of the variables, finding $z_{x}=\left(\frac{x-\bar{x}}{s_{x}}\right)$ and $z_{y}=\left(\frac{y-\bar{y}}{s_{y}}\right)$. With these, we can compute a measure of strength that you’ve probably heard of-the correlation coefficient:
$$r=\frac{\sum z_{x} z_{y}}{n-1}$$
Keep in mind that the $x$ ‘s and $y$ ‘s are paired. For each house we have a price and a living area. To find the correlation we multiply each standardized value by the standardized value it is paired with and add up those cross products. We divide the total by the number of pairs minus one, $n-1.2$

There are alternative formulas for the correlation in terms of the variables $x$ and $y$. Here are two of the more common:
$$r=\frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^{2} \sum(y-\bar{y})^{2}}}=\frac{\sum(x-\bar{x})(y-\bar{y})}{(n-1) s_{x} s_{y}}$$

Correlation measures the strength of the linear association between two quantitative variables. Before you use correlation, you must check three conditions:

• Quantitative Variables Condition: Correlation applies only to quantitative variables. Don’t apply correlation to categorical data masquerading as quantitative. Check that you know the variables’ units and what they measure.
• Linearity Condition: Sure, you can calculate a correlation coefficient for any pair of variables. But correlation measures the strength only of the linear association and will be misleading if the relationship is not straight enough. What is “straight enough”? This question may sound too informal for a statistical condition, but that’s really the point. We can’t verify whether a relationship is linear or not. Very few relationships between variables are perfectly linear, even in theory, and scatterplots of real data are never perfectly straight. How nonlinear looking would the scatterplot have to be to fail the condition? This is a judgment call that you just have to think about. Do you think that the underlying relationship is curved? If so, then summarizing its strength with a correlation would be misleading.
• Outlier Condition: Unusual observations can distort the correlation and can make an otherwise small correlation look big or, on the other hand, hide a large correlation. It can even give an otherwise positive association a negative correlation coefficient (and vice versa). When you see an outlier, it’s often a good idea to report the correlation both with and without the point.

Each of these conditions is easy to check with a scatterplot. Many correlations are reported without supporting data or plots. You should still think about the conditions. You should be cautious in interpreting (or accepting others’ interpretations of) the correlation when you can’t check the conditions for yourself.

Throughout this course, you’ll see that doing statistics right means selecting the proper methods. That means you have to think about the situation at hand. An important first step is to check that the type of analysis you plan is appropriate. These conditions are just the first of many such checks.

## 商业统计代写

$$r=\frac{\sum z_{x} z_{y}}{n-1}$$

$$r=\frac{\sum(x-\bar{x})(y-\bar{y})}{\sqrt{\sum(x-\bar{x})^{2} \sum(y-\bar{y})^{2}}}=\frac{\sum(x-\bar{x})(y-\bar{y})}{(n-1) s_{x} s_{y}}$$

• 定量变量条件：相关性仅适用于定量变量。不要将相关性应用于伪装成定量的分类数 据。检龺您是否知道变量的单位以及它们测量的内容。
• 线性条件：当然，您可以计算任何一对变量的相关系数。但相关性仅衡量线性关联的 强度，如果关系不够直，则会产生误导。什么是“够直”? 这个问题对于统计条件来说可 能听起来太不正式，但这确实是重点。我们无法验证关系是否是线性的。即使在理论 上，变量之间的关系也很少是完全线性的，并且真实数据的散点图从来都不是完全䇻 直的。散点图的非线性看起来有多非线性才能使条件失败? 这是一个你只需要考虑的 判断电话。你认为潜在的关系是弯曲的吗? 如果是这样，那么用相关性来总结其强度 将是误导性的。
• 异常值条件：不寻常的观察会扭曲相关性，并使原本很小的相关性看起来很大，或者 另一方面，隐藏大的相关性。它甚至可以给一个正相关的负相关系数（反之亦然）。
当您看到异常值时，报告有无该点的相关性通常是一个好主意。
这些条件中的每一个都可以通过散点图轻松检龺。许多相关性在没有支持数据或图表的情 况下被报告。您仍然应该考虑条件。当你无法自己检查条件时，您应该谨慎解释（或接受 他人的解释) 相关性。
在本课程中，您将看到正确进行统计意味着选择正确的方法。这意味着你必须考虑手头的 情况。重要的第一步是检龺您计划的分析类型是否合适。这些条件只是许多此类检柦中的 第一个。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。