## 统计代写|回归分析代写Regression Analysis代考|Main Effects of Categorical Variables

Categorical variables, also known as nominal variables, have values that you can put into a countable number of distinct groups based on a characteristic. For categorical variables, you have the variable name and the levels of that variable. The following table shows examples of several categorical variables and their levels.

With continuous variables, you can plot them on a scatterplot and see how one variable changes as you increase the value of the other variable. However, with categorical variables, you’re dealing with groups in your data that you cannot incrementally increase. Consequently, you interpret categorical variables differently in regression analysis. The levels of categorical variables represent groups in your data, and you can plot them using a boxplot, as shown below. Regression analysis estimates the mean differences between these groups and determines whether they are statistically significant.

These effects are main effects, which indicates that the effect sizes do not change based on the values of the other variables in the model.Including categorical variables in a regression model allows you to determine whether the differences in this type of graph are statistically significant while controlling for other variables in the model. Later in this section, we’ll analyze the data that this boxplot represents to determine whether the differences between the mean incomes of these groups are statistically significant.

## 统计代写|回归分析代写Regression Analysis代考|Coding Categorical Variables

Statistical software can’t take a categorical variable and directly analyze it. Instead, it converts categorical variables into indicator variables using a $(0,1)$ coding scheme. Indicator variables, also known as dummy variables, are columns of $1 s$ and $0 \mathrm{~s}$ that indicate the presence or absence of a characteristic. A 1 indicates the presence of a feature while a 0 represents its absence. The number of indicator variables depends on the number of categorical levels. To show you how this works, I’ll start with gender.

In the table, the Gender column represents the categorical data that you enter into the worksheet. The value depends on the gender of the subject for which the row corresponds. The Male and Female columns are the indicator variables based on the Gender column. The Male column contains 1 s for observations that correspond to males and 0 s for non-males. The opposite pattern applies to the Female column.

Notice how these two columns supply completely redundant information? One column predicts the other column perfectly. Statisticians refer to this as perfect multicollinearity, which creates an error if you include both in a regression model. For a categorical variable, you must omit one of the underlying indicator variables from the model, which becomes the reference level.

