Design Resources Server

	Analysis of Data from Designed Experiments
	Correlation and Regression

IASRI

Home

<<Back

Analysis Using SPSS

Analysis Using SAS

The questions 1 and 2 can be answered using PROC CORR of SAS. Scatter plot can be drawn using PROC PLOT and questions 4 to 8 can be answered using PROC REG. We begin with answering questions 1 and 2. The SAS statements for answering questions 1 and 2 are given in the sequel.

Data Input:

For performing analysis, input the data in the following format.

{Here serial number is termed as SN, plant population as PP, average plant height as PH, average number of green leaves as (NGL) and yield as YLD. It may, however, be noted that one can retain the same name or can code in any other fashion}.

Prepare a SAS data file using

Data corr; /*one can enter any other name for data*/

input sn pp ph ngl yld;

cards;

1 142.00 0.525 8.2 2.47

2 143.00 0.64 9.5 4.76

3 107.00 0.66 9.3 3.31

4 78.00 0.66 7.5 1.97

5 100.00 0.46 5.9 1.34

6 86.50 0.345 6.4 1.14

7 103.50 0.86 6.4 1.5

8 155.99 0.33 7.5 2.03

9 80.88 0.285 8.4 2.54

10 109.77 0.59 10.6 4.9

11 61.77 0.265 8.3 2.91

12 79.11 0.66 11.6 2.76

13 155.99 0.42 8.1 0.59

14 61.81 0.34 9.4 0.84

15 74.50 0.63 8.4 3.87

16 97.00 0.705 7.2 4.47

17 93.14 0.68 6.4 3.31

18 37.43 0.665 8.4 1.57

19 36.44 0.275 7.4 0.53

20 51.00 0.28 7.4 1.15

21 104.00 0.28 9.8 1.08

22 49.00 0.49 4.8 1.83

23 54.66 0.385 5.5 0.76

24 55.55 0.265 5.0 0.43

25 88.44 0.98 5.0 4.08

26 99.55 0.645 9.6 2.83

27 63.99 0.635 5.6 2.57

28 101.77 0.29 8.2 7.42

29 138.66 0.72 9.9 2.62

30 90.22 0.63 8.4 2.00

31 76.92 1.25 7.3 1.99

32 126.22 0.58 6.9 1.36

33 80.36 0.605 6.8 0.68

34 150.23 1.19 8.8 5.36

35 56.50 0.355 9.7 2.12

36 136.00 0.59 10.2 4.16

37 144.50 0.61 9.8 3.12

38 157.33 0.605 8.8 2.07

39 91.99 0.38 7.7 1.17

40 121.50 0.55 7.7 3.62

41 64.50 0.32 5.7 0.67

42 116.00 0.455 6.8 3.05

43 77.50 0.72 11.8 1.70

44 70.43 0.625 10.0 1.55

45 133.77 0.535 9.3 3.28

46 89.99 0.49 9.8 2.69

;

/* Obtain correlation coefficient between each pair of the variables PP, PH, NGL and yield using the following SAS statements*/

proc corr;

var pp ph ngl yld;

run;

/* Obtain partial correlation between NGL and yield after removing the linear effect of PP and PH by using the following SAS statements*/

proc corr;

var ngl yld;

partial pp ph;

run;

/* Obtain the scatter plot using the following SAS statements */

proc plot;

plot pp*yld = '*';

/*pp=VERTICAL AXIS yld = HORIZONTAL AXIS.*/

run;

/* Fit a multiple linear regression equation by taking yield as dependent variable and biometrical characters as explanatory variables. Print the matrices used in the regression computations using the following SAS statements*/

proc reg;

model yld= pp ph ngl/p r influence vif collin xpx i;

/* testing the significance of regression coefficients. This is also done by default in regression fitting*/

test1: test pp=0;

test2: test ph=0;

test3: test ngl=0;

*testing the equality of two regression coefficients;

test4: test pp-ph=0;

test4a: test pp=ph=0;

/*test 4 tests the equality of regression coefficients of pp and ph, whereas test4a test whether regression coefficients of pp and ph simultaneously are significantly different from zero*/

test5: test ph-ngl=0;

test5a: test ph=ngl=0;

run;

p: It calculates predicted values from the input data and the estimated model. The display includes the observation number, the ID variable (if one is specified), the actual and predicted values, and the residual. If the CLI, CLM, or R option is specified, the P option is unnecessary

r: Requests an analysis of the residuals. The results include everything requested by the p option plus the standard errors of the mean predicted and residual values, the studentized residual, and Cook's D statistic to measure the influence of each observation on the parameter estimates.

influence: Computes influence statistics

vif : Produces variance inflation factors with the parameter estimates. Variance inflation is the reciprocal of tolerance.

collin : produces collinearity analysis. It requests a detailed analysis of collinearity among the regressors. This includes eigenvalues, condition indices, and decomposition of the variances of the estimates with respect to each eigenvalue.

xpx: Displays the X'X crossproducts matrix for the model. The crossproducts matrix is bordered by the X'Y and Y'Y matrices.

i: displays sums-of-squares and crossproducts matrix. It displays the (X'X)^-1 matrix. The inverse of the crossproducts matrix is bordered by the parameter estimates and SSE matrices. */

/* A regression model without intercept can be fitted by any of the following two procedures*/

proc reg;

model yld=pp ph ngl;

restrict intercept=0; /* A RESTRICT statement is used to place restrictions on the parameter estimates in the MODEL preceding it. */

run;

proc reg;

model yld=pp ph ngl/noint; /* Use the NOINT option to fit a model without an intercept term */

run;

Data File

Result File

<<Back

Analysis Using SAS Analysis Using SPSS