How to set up smartphones and PCs. Informational portal

Description of regression analysis in excel. Mathematical Methods in Psychology

Regression in Excel

Statistical data processing can also be carried out using the add-in Analysis package in the "Service" menu sub-item. In Excel 2003, if you open SERVICE, we can't find the tab DATA ANALYSIS, then click the left mouse button to open the tab ADD-ONS and opposite point ANALYSIS PACKAGE by clicking the left mouse button, put a tick (Fig. 17).

Rice. 17. Window ADD-ONS

After that, the menu SERVICE tab appears DATA ANALYSIS.

In Excel 2007 to install PACKAGE ANALYSIS you need to click on the OFFICE button in the upper left corner of the sheet (Fig. 18a). Next, click on the button EXCEL OPTIONS. In the window that appears EXCEL OPTIONS left click on the item ADD-ONS and in the right part of the drop-down list, select the item ANALYSIS PACKAGE. Next, click on OK.


Excel Options Office button

Rice. 18. Installation PACKAGE ANALYSIS in Excel 2007

To install the Analysis Pack, click the button GO, at the bottom of the open window. The window shown in Fig. 12. Check the box next to ANALYSIS PACKAGE. In the tab DATA button will appear DATA ANALYSIS(Fig. 19).

From the proposed items, select the item " REGRESSION” and click on it with the left mouse button. Next, click OK.

The window shown in Fig. 21

Analysis Tool « REGRESSION» is used to fit a graph to a set of observations using the least squares method. Regression is used to analyze the effect on a single dependent variable of the values ​​of one or more independent variables. For example, an athlete's athletic performance is influenced by several factors, including age, height, and weight. It is possible to calculate the degree of influence of each of these three factors on the performance of an athlete, and then use the obtained data to predict the performance of another athlete.

The Regression tool uses the function LINEST.

REGRESS Dialog Box

Labels Select the checkbox if the first row or first column of the input range contains titles. Clear this check box if there are no headers. In this case, suitable headers for the output table data will be generated automatically.

Reliability Level Select the check box to include an additional level in the output totals table. In the appropriate field, enter the confidence level you want to apply, in addition to the default 95% confidence level.

Constant - zero Check the box to make the regression line pass through the origin.

Output Range Enter a reference to the top left cell of the output range. Allocate at least seven columns for the output table of results, which will include: results of analysis of variance, coefficients, standard error of Y calculation, standard deviations, number of observations, standard errors for coefficients.

New Worksheet Check this box to open a new worksheet in the workbook and insert the analysis results starting from cell A1. If necessary, enter a name for the new sheet in the field opposite the appropriate radio button position.

New Workbook Check this box to create a new workbook in which the results will be added to a new sheet.

Residuals Select the check box to include residuals in the output table.

Standardized Residuals Select the check box to include standardized residuals in the output table.

Residual Plot Check the box to plot the residuals for each independent variable.

Fit Plot Select the check box to plot predicted values ​​versus observed values.

Normal Probability Plot Check the box to plot normal probability.

Function LINEST

To perform calculations, select the cell in which we want to display the average value with the cursor and press the = key on the keyboard. Next, in the Name field, specify the desired function, for example AVERAGE(Fig. 22).


Rice. 22 Finding Functions in Excel 2003

If in the field NAME the name of the function does not appear, then left-click on the triangle next to the field, after which a window with a list of functions will appear. If this function is not in the list, then left-click on the item in the list OTHER FUNCTIONS, a dialog box will appear. FUNCTION MASTER, in which, using vertical scrolling, select the desired function, select it with the cursor and click on OK(Fig. 23).

Rice. 23. Function Wizard

To search for a function in Excel 2007, any tab can be opened in the menu, then to perform calculations, select the cell in which we want to display the average value with the cursor and press the = key on the keyboard. Next, in the Name field, specify the function AVERAGE. The window for calculating the function is similar to the one in Excel 2003.

You can also select the Formulas tab and left-click on the button in the " INSERT FUNCTION» (Fig. 24), a window will appear FUNCTION MASTER, the view of which is similar to Excel 2003. Also, in the menu, you can immediately select the category of functions (recently used, financial, logical, text, date and time, mathematical, other functions), in which we will search for the desired function.

Other features References and arrays Mathematical

Rice. 24 Function selection in Excel 2007

Function LINEST calculates statistics for a series using the least squares method to calculate a straight line that best approximates the available data and then returns an array that describes the resulting straight line. You can also combine the function LINEST with other functions to compute other kinds of models that are linear in unknown parameters (whose unknown parameters are linear), including polynomial, logarithmic, exponential, and power series. Because an array of values ​​is returned, the function must be specified as an array formula.

The equation for a straight line is:

(in case of multiple ranges of x values),

where the dependent value y is a function of the independent value x, the values ​​m are the coefficients corresponding to each independent variable x, and b is a constant. Note that y, x and m can be vectors. Function LINEST returns an array . LINEST may also return additional regression statistics.

LINEST(known_y-values; known_x-values; const; statistics)

Known_y values ​​- the set of y values ​​that are already known for the relation .

If the known_y's array has one column, then each column of the known_x's array is interpreted as a separate variable.

If the known_y's array has one row, then each row of the known_x's array is interpreted as a separate variable.

Known_x's is an optional set of x's that are already known for the relation.

The known_x array can contain one or more sets of variables. If only one variable is used, then arrays_known_y_values ​​and known_x_values ​​can be of any shape - as long as they have the same dimension. If more than one variable is used, then known_y's must be a vector (that is, one row high or one column wide).

If array_known_x is omitted, then this array (1;2;3;...) is assumed to be the same size as array_known_y.

Const is a boolean value that specifies whether the constant b is required to be 0.

If the argument "const" is TRUE or omitted, then the constant b is evaluated normally.

If the argument "const" is FALSE, then the value of b is assumed to be 0 and the values ​​of m are selected in such a way that the relation is satisfied.

Statistics is a Boolean value that indicates whether additional regression statistics should be returned.

If statistics is TRUE, LINEST returns additional regression statistics. The returned array will look like this: (mn;mn-1;...;m1;b:sen;sen-1;...;se1;seb:r2;sey:F;df:ssreg;ssresid).

If statistics is FALSE or omitted, LINEST returns only the coefficients m and the constant b.

Additional regression statistics.

Value Description se1,se2,...,sen Standard error values ​​for the coefficients m1,m2,...,mn. seb The standard error for the constant b (seb = #N/A if 'const' is FALSE). r2 Determination coefficient. The actual values ​​of y are compared with the values ​​obtained from the straight line equation; based on the results of the comparison, the coefficient of determinism is calculated, normalized from 0 to 1. If it is equal to 1, then there is a complete correlation with the model, i.e. there is no difference between the actual and estimated values ​​of y. Otherwise, if the coefficient of determinism is 0, there is no point in using the regression equation to predict y values. For more information on how to calculate r2, see "Remarks" at the end of this section. sey The standard error for the y estimate. F F-statistic or F-observed value. The F statistic is used to determine whether an observed relationship between the dependent and independent variables is random. df Degrees of freedom. Degrees of freedom are useful for finding F-critical values ​​in a statistical table. To determine the confidence level of the model, you must compare the values ​​in the table with the F-statistic returned by LINEST. See "Remarks" at the end of this section for more information about calculating df. Example 4 below shows the use of F and df. ssreg Regression sum of squares. ssresid Residual sum of squares. For more information about calculating ssreg and ssresid, see "Remarks" at the end of this section.

The figure below shows the order in which additional regression statistics are returned.

Notes:

Any straight line can be described by its slope and intersection with the y-axis:

Slope (m): To determine the slope of a line, usually denoted by m, you need to take two points on the line and ; slope will be .

Y-intersection (b): The y-intersection of a line, usually denoted by b, is the y value for the point where the line intersects the y-axis.

The straight line equation has the form . If the values ​​of m and b are known, then any point on the line can be calculated by substituting the values ​​of y or x into the equation. You can also use the TREND function.

If there is only one independent variable x, you can get the slope and y-intercept directly using the following formulas:

Slope: INDEX(LINEST(known_y's, known_x's), 1)

Y-intercept: INDEX(LINEST(known_y's, known_x's), 2)

The accuracy of the approximation using the straight line calculated by the LINEST function depends on the degree of data scatter. The closer the data is to a straight line, the more accurate the model used by LINEST. The LINEST function uses the least squares method to determine the best fit to the data. When there is only one independent variable x, m and b are calculated using the following formulas:

where x and y are sample means, for example x = AVERAGE(known_x's) and y = AVERAGE(known_y's).

The LINEST and LGRFPRIBL fit functions can compute a straight or exponential curve that best fits the data. However, they do not answer the question which of the two results is more suitable for solving the problem. You can also calculate the TREND(known_y-values; known_x-values) function for a straight line, or the GROWTH(known_y-values; known_x-values) function for an exponential curve. These functions, if omitted from the new_x_values ​​argument, return an array of computed y values ​​for the actual x values ​​according to a straight line or curve. You can then compare the calculated values ​​with the actual values. You can also build charts for visual comparison.

When performing a regression analysis, Microsoft Excel calculates, for each point, the square of the difference between the predicted y value and the actual y value. The sum of these squared differences is called the residual sum of squares (ssresid). Microsoft Excel then calculates the total sum of squares (sstotal). If const = TRUE or if this argument is not specified, the total sum of squares will be equal to the sum of the squared differences of the real y values ​​and the mean y values. If const = FALSE, the sum of squares will be equal to the sum of the squares of the real y values ​​(without subtracting the mean y from the quotient y). After that, the regression sum of squares can be calculated as follows: ssreg = sstotal - ssresid. The smaller the residual sum of squares, the greater the value of the coefficient of determinism r2, which indicates how well the equation obtained using regression analysis explains the relationships between variables. The coefficient r2 is equal to ssreg/sstotal.

In some cases, one or more X columns (assuming Y and X values ​​are in columns) does not have an additional predictive value in other X columns. In other words, deleting one or more X columns can result in Y values ​​computed with the same precision. In this case, redundant X columns will be excluded from the regression model. This phenomenon is called "collinearity" because the redundant columns of X can be represented as the sum of several non-redundant columns. LINEST checks for collinearity and removes any redundant X columns from the regression model if it finds any. Removed X columns can be identified in LINEST output by a factor of 0 and a value of se of 0. Removing one or more columns as redundant changes the value of df because it depends on the number of X columns actually used for predictive purposes. See Example 4 below for more details on calculating df. When df changes due to the removal of redundant columns, the values ​​of sey and F also change. It is often not recommended to use collinearity. However, it should be used if some X columns contain 0 or 1 as an indicator indicating whether the subject of the experiment is in a separate group. If const = TRUE or if this argument is not specified, LINEST inserts an additional X column to simulate the intersection point. If there is a column with values ​​1 for males and 0 for females, and there is a column with values ​​1 for females and 0 for males, then the last column is removed because its values ​​can be obtained from the "male indicator" column.

The calculation of df for cases where X columns are not removed from the model due to collinearity is as follows: if there are k known_x columns and const = TRUE or not specified, then df = n - k - 1. If const = FALSE, then df = n -k. In both cases, removing the X columns due to collinearity increases the value of df by 1.

Formulas that return arrays must be entered as array formulas.

When entering an array of constants as a known_x_values ​​argument, for example, use a semicolon to separate values ​​on the same line, and a colon to separate lines. Separator characters may vary depending on the settings in the "Language and Standards" window in the control panel.

Note that the y values ​​predicted by the regression equation may not be correct if they are outside the range of y values ​​that were used to define the equation.

The main algorithm used in the function LINEST, differs from the main algorithm of functions INCLINE And LINE SEGMENT. Differences between algorithms can lead to different results for uncertain and collinear data. For example, if the data points of the known_y's argument are 0 and the data points of the known_x's argument are 1, then:

Function LINEST returns a value equal to 0. Function algorithm LINEST is used to return suitable values ​​for collinear data, in which case at least one answer can be found.

The SLOPE and INTERCEPT functions return the #DIV/0! error. The algorithm of the SLOPE and INTERCEPT functions is used to find only one answer, and in this case there may be several.

In addition to calculating statistics for other types of regression, LINEST can be used to calculate ranges for other types of regression by entering functions of the x and y variables as a series of x and y variables for LINEST. For example, the following formula:

LINEST(y-values, x-values^COLUMN($A:$C))

works with one column of Y values ​​and one column of X values ​​to compute a cube approximation (3rd degree polynomial) of the following form:

The formula can be modified for calculations of other types of regression, but in some cases, adjustments to the output values ​​and other statistics are required.

Shows the influence of some values ​​(independent, independent) on the dependent variable. For example, how the number of economically active population depends on the number of enterprises, wages, and other parameters. Or: how do foreign investments, energy prices, etc. affect the level of GDP.

The result of the analysis allows you to prioritize. And based on the main factors, to predict, plan the development of priority areas, make management decisions.

Regression happens:

linear (y = a + bx);

parabolic (y = a + bx + cx 2);

exponential (y = a * exp(bx));

Power (y = a*x^b);

hyperbolic (y = b/x + a);

logarithmic (y = b * 1n(x) + a);

exponential (y = a * b^x).

Consider the example of building a regression model in Excel and interpreting the results. Let's take a linear type of regression.

Task. At 6 enterprises, the average monthly salary and the number of employees who left were analyzed. It is necessary to determine the dependence of the number of retired employees on the average salary.

The linear regression model has the following form:

Y \u003d a 0 + a 1 x 1 + ... + a k x k.

Where a are the regression coefficients, x are the influencing variables, and k is the number of factors.

In our example, Y is the indicator of quit workers. The influencing factor is wages (x).

Excel has built-in functions that can be used to calculate the parameters of a linear regression model. But the Analysis ToolPak add-in will do it faster.

Activate a powerful analytical tool:

1. Click the "Office" button and go to the "Excel Options" tab. "Add-ons".

2. Below, under the drop-down list, in the "Management" field there will be an inscription "Excel Add-ins" (if it is not there, click on the checkbox on the right and select). And a Go button. Click.

3. A list of available add-ons opens. Select "Analysis Package" and click OK.

Once activated, the add-on will be available under the Data tab.

Now we will deal directly with the regression analysis.

1. Open the menu of the Data Analysis tool. Select "Regression".



2. A menu will open for selecting input values ​​and output options (where to display the result). In the fields for the initial data, we indicate the range of the described parameter (Y) and the factor influencing it (X). The rest may or may not be completed.

3. After clicking OK, the program will display the calculations on a new sheet (you can select the interval to display on the current sheet or assign the output to a new workbook).

First of all, we pay attention to the R-square and coefficients.

R-square is the coefficient of determination. In our example, it is 0.755, or 75.5%. This means that the calculated parameters of the model explain the relationship between the studied parameters by 75.5%. The higher the coefficient of determination, the better the model. Good - above 0.8. Poor - less than 0.5 (such an analysis can hardly be considered reasonable). In our example - "not bad".

The coefficient 64.1428 shows what Y will be if all the variables in the model under consideration are equal to 0. That is, other factors that are not described in the model also affect the value of the analyzed parameter.

The coefficient -0.16285 shows the weight of the variable X on Y. That is, the average monthly salary within this model affects the number of quitters with a weight of -0.16285 (this is a small degree of influence). The “-” sign indicates a negative impact: the higher the salary, the less quit. Which is fair.

Building a linear regression, estimating its parameters and their significance can be done much faster when using the Excel analysis package (Regression). Let us consider the interpretation of the obtained results in the general case ( k explanatory variables) according to Example 3.6.

Table regression statistics values ​​are given:

Multiple R – coefficient of multiple correlation ;

R- square– coefficient of determination R 2 ;

Normalized R - square- adjusted R 2 adjusted for the number of degrees of freedom;

standard error is the standard error of the regression S;

Observations - number of observations n.

Table Analysis of variance given:

1. Column df - the number of degrees of freedom, equal to

for string Regression df = k;

for string Remainderdf = nk – 1;

for string Totaldf = n– 1.

2. Column SS- sum of squared deviations, equal to

for string Regression ;

for string Remainder ;

for string Total .

3. Column MS variances determined by the formula MS = SS/df:

for string Regression– factor variance;

for string Remainder is the residual variance.

4. Column F - calculated value F-criteria calculated by the formula

F = MS(regression)/ MS(remainder).

5. Column Significance F is the significance level value corresponding to the calculated F-statistics .

Significance F= FRIST( F- statistics, df(regression), df(remainder)).

If significance F < стандартного уровня значимости, то R 2 is statistically significant.

Coefficients standard error t-statistics p-value bottom 95% Top 95%
Y 65,92 11,74 5,61 0,00080 38,16 93,68
X 0,107 0,014 7,32 0,00016 0,0728 0,142

This table shows:

1. Odds– coefficient values a, b.

2. Standard error are the standard errors of the regression coefficients S a, Sb.



3. t- statistics– calculated values t -criteria calculated by the formula:

t-statistic = Coefficients / Standard error.

4.R-value (significance t) is the value of the significance level corresponding to the calculated t- statistics.

R-value = STUDRASP(t-statistics, df(remainder)).

If R-meaning< стандартного уровня значимости, то соответствующий коэффициент статистически значим.

5. Bottom 95% and Top 95% are the lower and upper bounds of the 95% confidence intervals for the coefficients of the theoretical linear regression equation.

REMAINING WITHDRAWAL
Observation Predicted y Remains e
72,70 -29,70
82,91 -20,91
94,53 -4,53
105,72 5,27
117,56 12,44
129,70 19,29
144,22 20,77
166,49 24,50
268,13 -27,13

Table REMAINING WITHDRAWAL indicated:

in a column Observation– observation number;

in a column predicted y are the calculated values ​​of the dependent variable;

in a column Remains e is the difference between the observed and calculated values ​​of the dependent variable.

Example 3.6. Data available (arb. units) on food expenditures y and per capita income x for nine groups of families:

x
y

Using the results of the Excel analysis package (Regression), we analyze the dependence of food costs on the value of per capita income.

The results of the regression analysis are usually written as:

where in parentheses are the standard errors of the regression coefficients.

Regression coefficients A = 65,92 and b= 0.107. Communication direction between y And x determines the sign of the regression coefficient b= 0.107, i.e. the relationship is direct and positive. Coefficient b= 0.107 shows that with an increase in per capita income by 1 arb. units food costs increase by 0.107 conv. units

Let us estimate the significance of the coefficients of the obtained model. The significance of the coefficients ( a, b) is checked against t- test:

p-value ( a) = 0,00080 < 0,01 < 0,05

p-value ( b) = 0,00016 < 0,01 < 0,05,

hence the coefficients ( a, b) are significant at the 1% level, and even more so at the 5% level of significance. Thus, the regression coefficients are significant and the model is adequate to the original data.

The regression estimation results are compatible not only with the obtained values ​​of the regression coefficients, but also with some of their set (confidence interval). With a probability of 95%, the confidence intervals for the coefficients are (38.16 - 93.68) for a and (0.0728 - 0.142) for b.

The quality of the model is assessed by the coefficient of determination R 2 .

Value R 2 = 0.884 means that the per capita income factor can explain 88.4% of the variation (scatter) in food spending.

Significance R 2 checked by F- test: significance F = 0,00016 < 0,01 < 0,05, следовательно, R 2 is significant at the 1% level, and even more so at the 5% level of significance.

In the case of pairwise linear regression, the correlation coefficient can be defined as . The obtained value of the correlation coefficient indicates that the relationship between food expenditures and per capita income is very close.

Regression analysis is one of the most popular methods of statistical research. It can be used to determine the degree of influence of independent variables on the dependent variable. The functionality of Microsoft Excel has tools designed to carry out this type of analysis. Let's take a look at what they are and how to use them.

Connecting the analysis package

But, in order to use the function that allows you to conduct regression analysis, first of all, you need to activate the Analysis Package. Only then the tools necessary for this procedure will appear on the Excel ribbon.

  1. Move to the "File" tab.
  2. Go to the "Settings" section.
  3. The Excel Options window opens. Go to the "Add-ons" subsection.
  4. In the very bottom part of the window that opens, we rearrange the switch in the "Management" block to the "Excel Add-ins" position, if it is in a different position. Click on the "Go" button.
  5. The Excel add-ins window opens. Check the box next to "Analysis Package". Click on the "OK" button.

Now, when we go to the "Data" tab, on the ribbon in the "Analysis" tool block, we will see a new button - "Data Analysis".

Types of regression analysis

There are several types of regressions:

  • parabolic;
  • power;
  • logarithmic;
  • exponential;
  • demonstration;
  • hyperbolic;
  • linear regression.

We will talk in more detail about the implementation of the last type of regression analysis in Excel later.

Linear Regression in Excel

Below, as an example, is a table that shows the average daily air temperature on the street, and the number of store customers for the corresponding working day. Let's find out with the help of regression analysis exactly how weather conditions in the form of air temperature can affect the attendance of a retail establishment.

The general linear regression equation looks like this: Y = a0 + a1x1 + ... + axk. In this formula, Y means the variable on which we are trying to study the influence of factors. In our case, this is the number of buyers. The value of x is the various factors that affect the variable. The a parameters are the regression coefficients. That is, they determine the significance of a particular factor. Index k denotes the total number of these same factors.


Analysis results analysis

The results of the regression analysis are displayed in the form of a table in the place specified in the settings.

One of the main indicators is the R-square. It indicates the quality of the model. In our case, this coefficient is 0.705 or about 70.5%. This is an acceptable level of quality. A relationship less than 0.5 is bad.

Another important indicator is located in the cell at the intersection of the "Y-intersection" line and the "Coefficients" column. Here it is indicated what value Y will have, and in our case, this is the number of buyers, with all other factors equal to zero. In this table, this value is 58.04.

The value at the intersection of the column "Variable X1" and "Coefficients" shows the level of dependence of Y on X. In our case, this is the level of dependence of the number of store customers on temperature. A coefficient of 1.31 is considered a fairly high indicator of influence.

As you can see, it is quite easy to create a regression analysis table using Microsoft Excel. But, only a trained person can work with the data obtained at the output, and understand their essence.

We are glad we were able to help you resolve the issue.

Ask your question in the comments, describing in detail the essence of the problem. Our experts will try to answer as quickly as possible.

Did this article help you?

The linear regression method allows us to describe a straight line that best fits a series of ordered pairs (x, y). The equation for a straight line, known as the linear equation, is given below:

ŷ is the expected value of y for a given value of x,

x - independent variable,

a - segment on the y-axis for a straight line,

b is the slope of the straight line.

In the figure below, this concept is represented graphically:

The figure above shows a line described by the equation ŷ =2+0.5x. The segment on the y-axis is the point of intersection of the line with the y-axis; in our case, a = 2. The slope of the line, b, the ratio of line rise to line length, has a value of 0.5. A positive slope means that the line rises from left to right. If b = 0, the line is horizontal, which means that there is no relationship between the dependent and independent variables. In other words, changing the value of x does not affect the value of y.

ŷ and y are often confused. The graph shows 6 ordered pairs of points and a line, according to the given equation

This figure shows the point corresponding to the ordered pair x = 2 and y = 4. Note that the expected value of y according to the line at X= 2 is ŷ. We can confirm this with the following equation:

ŷ = 2 + 0.5х =2 +0.5(2) =3.

The y-value is the actual point, and the ŷ-value is the expected y-value using a linear equation for a given x-value.

The next step is to determine the linear equation that best matches the set of ordered pairs, we talked about this in the previous article, where we determined the form of the equation using the least squares method.

Using Excel to Define Linear Regression

In order to use the regression analysis tool built into Excel, you need to activate the add-in Analysis package. You can find it by clicking on the tab File –> Options(2007+), in the dialog that appears Optionsexcel go to tab Add-ons. In field Control choose add-onsexcel and click Go. In the window that appears, check the box next to analysis package, click OK.

In the tab Data in Group Analysis a new button will appear Data analysis.

To demonstrate how the add-in works, let's use the data from the previous article, where a guy and a girl share a table in the bathroom. Enter the data for our bathroom example in columns A and B of a blank sheet.

Go to tab Data, in Group Analysis click Data analysis. In the window that appears Data analysis select Regression as shown in the figure and click OK.

Set the required regression parameters in the window Regression, as it shown on the picture:

Click OK. The figure below shows the results obtained:

These results are consistent with those that we obtained by independent calculations in the previous article.

Regression analysis is a statistical research method that allows you to show the dependence of a parameter on one or more independent variables. In the pre-computer era, its use was quite difficult, especially when it came to large amounts of data. Today, having learned how to build a regression in Excel, you can solve complex statistical problems in just a couple of minutes. Below are specific examples from the field of economics.

Types of regression

The concept itself was introduced into mathematics by Francis Galton in 1886. Regression happens:

  • linear;
  • parabolic;
  • power;
  • exponential;
  • hyperbolic;
  • demonstrative;
  • logarithmic.

Example 1

Consider the problem of determining the dependence of the number of retired team members on the average salary at 6 industrial enterprises.

Task. At six enterprises, we analyzed the average monthly salary and the number of employees who left of their own free will. In tabular form we have:

For the problem of determining the dependence of the number of retired workers on the average salary at 6 enterprises, the regression model has the form of the equation Y = a0 + a1 × 1 + ... + akxk, where xi are the influencing variables, ai are the regression coefficients, and k is the number of factors.

For this task, Y is the indicator of employees who left, and the influencing factor is the salary, which we denote by X.

Using the capabilities of the spreadsheet "Excel"

Regression analysis in Excel must be preceded by the application of built-in functions to the available tabular data. However, for these purposes, it is better to use the very useful add-in "Analysis Toolkit". To activate it you need:

  • from the "File" tab, go to the "Options" section;
  • in the window that opens, select the line "Add-ons";
  • click on the "Go" button located at the bottom, to the right of the "Management" line;
  • check the box next to the name "Analysis Package" and confirm your actions by clicking "OK".

If everything is done correctly, the desired button will appear on the right side of the Data tab, located above the Excel worksheet.

Linear Regression in Excel

Now that we have at hand all the necessary virtual tools for performing econometric calculations, we can begin to solve our problem. For this:

  • click on the "Data Analysis" button;
  • in the window that opens, click on the "Regression" button;
  • in the tab that appears, enter the range of values ​​for Y (the number of employees who quit) and for X (their salaries);
  • We confirm our actions by pressing the "Ok" button.

As a result, the program will automatically populate a new sheet of the spreadsheet with regression analysis data. Note! Excel has the ability to manually set the location you prefer for this purpose. For example, it could be the same sheet where the Y and X values ​​are, or even a new workbook specifically designed to store such data.

Analysis of regression results for R-square

In Excel, the data obtained during the processing of the data of the considered example looks like this:

First of all, you should pay attention to the value of the R-square. It is the coefficient of determination. In this example, R-square = 0.755 (75.5%), i.e., the calculated parameters of the model explain the relationship between the considered parameters by 75.5%. The higher the value of the coefficient of determination, the more applicable the chosen model for a particular task. It is believed that it correctly describes the real situation with an R-squared value above 0.8. If the R-square is tcr, then the hypothesis of the insignificance of the free term of the linear equation is rejected.

In the problem under consideration for the free member, using the Excel tools, it was obtained that t = 169.20903, and p = 2.89E-12, i.e. we have a zero probability that the correct hypothesis about the insignificance of the free member will be rejected. For the coefficient at unknown t=5.79405, and p=0.001158. In other words, the probability that the correct hypothesis about the insignificance of the coefficient for the unknown will be rejected is 0.12%.

Thus, it can be argued that the resulting linear regression equation is adequate.

The problem of the expediency of buying a block of shares

Multiple regression in Excel is performed using the same Data Analysis tool. Consider a specific applied problem.

The management of NNN must make a decision on the advisability of purchasing a 20% stake in MMM SA. The cost of the package (JV) is 70 million US dollars. NNN specialists collected data on similar transactions. It was decided to evaluate the value of the block of shares according to such parameters, expressed in millions of US dollars, as:

  • accounts payable (VK);
  • annual turnover (VO);
  • accounts receivable (VD);
  • cost of fixed assets (SOF).

In addition, the parameter payroll arrears of the enterprise (V3 P) in thousands of US dollars is used.

Solution using Excel spreadsheet

First of all, you need to create a table of initial data. It looks like this:

  • call the "Data Analysis" window;
  • select the "Regression" section;
  • in the box "Input interval Y" enter the range of values ​​of dependent variables from column G;
  • click on the icon with a red arrow to the right of the "Input interval X" window and select the range of all values ​​​​from columns B, C, D, F on the sheet.

Select "New Worksheet" and click "Ok".

Get the regression analysis for the given problem.

Examination of the results and conclusions

“We collect” from the rounded data presented above on the Excel spreadsheet sheet, the regression equation:

SP \u003d 0.103 * SOF + 0.541 * VO - 0.031 * VK + 0.405 * VD + 0.691 * VZP - 265.844.

In a more familiar mathematical form, it can be written as:

y = 0.103*x1 + 0.541*x2 - 0.031*x3 +0.405*x4 +0.691*x5 - 265.844

Data for JSC "MMM" are presented in the table:

Substituting them into the regression equation, they get a figure of 64.72 million US dollars. This means that the shares of JSC MMM should not be purchased, since their value of 70 million US dollars is rather overstated.

As you can see, the use of the Excel spreadsheet and the regression equation made it possible to make an informed decision regarding the feasibility of a very specific transaction.

Now you know what regression is. The examples in Excel discussed above will help you solve practical problems from the field of econometrics.

Top Related Articles