Design Resources Server

	Analysis of Data from Designed Experiments
	Groups of Experiments

IASRI

Home

<<Back

Analysis Using SAS

For performing the analysis, following steps may be used:

Data Input:
For performing analysis, input the data in the following format. {Here the strain codes (treatments) are termed as varn, Replication as rep, block as blk and seed yield as syield. It may, however, be noted that one can retain the same name or can code in any other fashion}.

/*please see that if varn is replaced by treatment and yield by the character required, then this code can be used for any other situation as well*/

Prepare a SAS data file using

Options nodate nonumber ;

data wholedata;

input loc $ rep varn syield;

cards;

Bathinda 1 1 1794

Bathinda 1 2 1134

Bathinda 1 3 718

Bathinda 1 4 1852

Bathinda 1 5 2245

Bathinda 1 6 1111

Bathinda 1 7 1181

Bathinda 1 8 1644

Bathinda 1 9 1551

Bathinda 1 10 1968

Bathinda 1 11 2662

Bathinda 1 12 1065

Bathinda 2 1 2014

Bathinda 2 2 1736

Bathinda 2 3 764

Bathinda 2 4 1551

Bathinda 2 5 2361

Bathinda 2 6 1065

Bathinda 2 7 880

Bathinda 2 8 1991

Bathinda 2 9 1435

Bathinda 2 10 1551

Bathinda 2 11 2338

Bathinda 2 12 1227

Bathinda 3 1 2581

Bathinda 3 2 1898

Bathinda 3 3 880

Bathinda 3 4 1887

Bathinda 3 5 2407

Bathinda 3 6 1111

Bathinda 3 7 1528

Bathinda 3 8 2060

Bathinda 3 9 1991

Bathinda 3 10 2569

Bathinda 3 11 3056

Bathinda 3 12 1343

IARINewDelhi 1 1 2600

IARINewDelhi 1 2 3289

IARINewDelhi 1 3 2756

IARINewDelhi 1 4 2600

IARINewDelhi 1 5 2689

IARINewDelhi 1 6 2578

IARINewDelhi 1 7 3178

IARINewDelhi 1 8 3244

IARINewDelhi 1 9 2444

IARINewDelhi 1 10 3156

IARINewDelhi 1 11 2667

IARINewDelhi 1 12 2689

IARINewDelhi 2 1 2444

IARINewDelhi 2 2 2667

IARINewDelhi 2 3 2511

IARINewDelhi 2 4 2444

IARINewDelhi 2 5 2422

IARINewDelhi 2 6 2400

IARINewDelhi 2 7 3044

IARINewDelhi 2 8 2911

IARINewDelhi 2 9 2222

IARINewDelhi 2 10 2978

IARINewDelhi 2 11 2267

IARINewDelhi 2 12 2444

IARINewDelhi 3 1 2711

IARINewDelhi 3 2 2889

IARINewDelhi 3 3 2400

IARINewDelhi 3 4 2222

IARINewDelhi 3 5 2444

IARINewDelhi 3 6 2222

IARINewDelhi 3 7 2889

IARINewDelhi 3 8 3111

IARINewDelhi 3 9 2667

IARINewDelhi 3 10 2756

IARINewDelhi 3 11 2111

IARINewDelhi 3 12 2289

Hisar 1 1 3286

Hisar 1 2 2518

Hisar 1 3 757

Hisar 1 4 2553

Hisar 1 5 2908

Hisar 1 6 1797

Hisar 1 7 1749

Hisar 1 8 1501

Hisar 1 9 1513

Hisar 1 10 2447

Hisar 1 11 2600

Hisar 1 12 1631

Hisar 2 1 2459

Hisar 2 2 2364

Hisar 2 3 993

Hisar 2 4 2388

Hisar 2 5 2482

Hisar 2 6 1560

Hisar 2 7 1537

Hisar 2 8 2317

Hisar 2 9 1608

Hisar 2 10 2459

Hisar 2 11 2884

Hisar 2 12 1466

Hisar 3 1 3286

Hisar 3 2 2364

Hisar 3 3 875

Hisar 3 4 2884

Hisar 3 5 2884

Hisar 3 6 2033

Hisar 3 7 1537

Hisar 3 8 2577

Hisar 3 9 2104

Hisar 3 10 2813

Hisar 3 11 2648

Hisar 3 12 1844

Ludhiana 1 1 1370

Ludhiana 1 2 904

Ludhiana 1 3 858

Ludhiana 1 4 904

Ludhiana 1 5 1438

Ludhiana 1 6 873

Ludhiana 1 7 848

Ludhiana 1 8 1668

Ludhiana 1 9 910

Ludhiana 1 10 1558

Ludhiana 1 11 1508

Ludhiana 1 12 1280

Ludhiana 2 1 1209

Ludhiana 2 2 729

Ludhiana 2 3 942

Ludhiana 2 4 959

Ludhiana 2 5 1456

Ludhiana 2 6 959

Ludhiana 2 7 639

Ludhiana 2 8 1770

Ludhiana 2 9 907

Ludhiana 2 10 1606

Ludhiana 2 11 1389

Ludhiana 2 12 1207

Ludhiana 3 1 1320

Ludhiana 3 2 1007

Ludhiana 3 3 839

Ludhiana 3 4 1155

Ludhiana 3 5 1695

Ludhiana 3 6 946

Ludhiana 3 7 643

Ludhiana 3 8 1607

Ludhiana 3 9 1081

Ludhiana 3 10 1705

Ludhiana 3 11 1447

Ludhiana 3 12 1256

Navgaon 1 1 2233

Navgaon 1 2 2222

Navgaon 1 3 2000

Navgaon 1 4 2667

Navgaon 1 5 2444

Navgaon 1 6 1778

Navgaon 1 7 1778

Navgaon 1 8 3000

Navgaon 1 9 1778

Navgaon 1 10 3778

Navgaon 1 11 3111

Navgaon 1 12 2222

Navgaon 2 1 2222

Navgaon 2 2 2444

Navgaon 2 3 1778

Navgaon 2 4 3289

Navgaon 2 5 2000

Navgaon 2 6 1889

Navgaon 2 7 1722

Navgaon 2 8 2889

Navgaon 2 9 1611

Navgaon 2 10 3667

Navgaon 2 11 3111

Navgaon 2 12 2000

Navgaon 3 1 2222

Navgaon 3 2 2722

Navgaon 3 3 1778

Navgaon 3 4 3333

Navgaon 3 5 2000

Navgaon 3 6 1556

Navgaon 3 7 1722

Navgaon 3 8 3222

Navgaon 3 9 1333

Navgaon 3 10 3556

Navgaon 3 11 3222

Navgaon 3 12 2222

TERINewDelhi 1 1 1666

TERINewDelhi 1 2 1611

TERINewDelhi 1 3 1389

TERINewDelhi 1 4 1511

TERINewDelhi 1 5 1644

TERINewDelhi 1 6 1833

TERINewDelhi 1 7 1788

TERINewDelhi 1 8 1644

TERINewDelhi 1 9 1889

TERINewDelhi 1 10 2000

TERINewDelhi 1 11 944

TERINewDelhi 1 12 1488

TERINewDelhi 2 1 1333

TERINewDelhi 2 2 1389

TERINewDelhi 2 3 1244

TERINewDelhi 2 4 1778

TERINewDelhi 2 5 1622

TERINewDelhi 2 6 1822

TERINewDelhi 2 7 2333

TERINewDelhi 2 8 2220

TERINewDelhi 2 9 1822

TERINewDelhi 2 10 1556

TERINewDelhi 2 11 388

TERINewDelhi 2 12 1400

TERINewDelhi 3 1 2222

TERINewDelhi 3 2 1944

TERINewDelhi 3 3 2056

TERINewDelhi 3 4 1889

TERINewDelhi 3 5 1711

TERINewDelhi 3 6 2111

TERINewDelhi 3 7 1711

TERINewDelhi 3 8 2220

TERINewDelhi 3 9 2444

TERINewDelhi 3 10 1356

TERINewDelhi 3 11 722

TERINewDelhi 3 12 1356

;

proc sort; /* This SAS statement sort the data with respect to the locations*/

by loc; /* if one has years in place of locations, replace loc by year*/

run;

ods trace on; /*Writes to the SAS log a record of each output object that is created*/

/* To create a table for degrees of freedom and the respective mean square error for each location for the calculations to test the homogeneity of variance using Bartlett's Chi-square test one can make use of the following SAS statements*/

*use ods to output the anova table;

ods output overallanova=MSerror;

ods output means=mean;

* 1. To perform the analysis of data for each of the locations separately one can use the following SAS statements. ;

proc glm data = wholedata;

class rep varn;

model syield =rep varn ;

means varn;

means varn/tukey;

by loc;

quit;

ods output close;

ods trace off;

* 2. To test the homogeneity of error variances using Bartlett's Chi-square test one can use the following SAS statements. ;

/* This creates a data set containing MSE for each location with their respective degrees of freedom */

data required;

set MSerror(where=(source='Error') keep=loc source df ms);

run;

/*To check the homogeneity of variances we apply Bartlett's Chi-square test*/

/* SAS Code for testing the homogeneity of variances, when variances and the degrees of freedom are given.

It is useful for testing the homogeneity of error mean squares, when the experiments are conducted over environments.

Code written by Dr Rajender Parsad and Sh. Ajeet Kumar

IASRI, Library Avenue, New Delhi, 110 012, India*/

proc iml;

use required;

read all into a; /* use variances of residual variance putting it in m1 variable*/

*a =m1[2:nrow(m1),ncol(m1)-1:ncol(m1)];/*from m1 extract variances and number of observations */

v =0;ct = 0;nchi = 0;St = 0;

do i = 1 to nrow(a); /* computing pooled variance */

St = St + (a[i,1]-1)*a[i,2];

v = v + (a[i,1]-1);

ct = ct + 1/(a[i,1]-1);

end;

S = St/v;

dchi = (1 + (1/(3*(nrow(a)-1)))*(ct-(1/v))); /*computing denominator of Bartlett's chi-square statistic*/

do i = 1 to nrow(a);

nchi = nchi + (a[i,1]-1)*(log(S/a[i,2]));

end;

chi = nchi/dchi;

probability = 1 - probchi(chi,(nrow(a)-1));/*computing chi-value and prob.*/

df = (nrow(a)-1);

print probability chi df S; /* printing chi value, probability and degree of freedom*/

if probability >= 0.05 then Interpretation = "Data is Homogeneous at 5% level of Significance";

else Interpretation = "Data is Heterogeneous at 5% level of Significance";

print Interpretation; /* testing and printing interpretation*/

pb = char(probability);

* 3. If the error variances are heterogeneous data, then for applying Aitken's transformation one can use the following SAS statements. ;

/* If error variances are homogeneous, there is no need of transformation, if error variances are heterogeneous, divide each value by square root of corresponding MSE. In this example, error variances are heterogeneous; transform the data by dividing each observation by its corresponding square root of the mean square error and create a new variable new_var and use the following SAS statements for the combined

analysis of data*/

/* This SAS statement creates a table for the values of mean square error (MSE) to be used for

transformation of data*/

ods html body = 'mse.xls';

proc print data = required;

var loc ms;

run;

ods html close;

data tranformed; /* This set of SAS statements transforms the data*/

set wholedata;

if loc="Bathinda" then

new_var=syield/sqrt(44677.15);

if loc="Hisar" then

new_var=syield/sqrt(64092.6);

if loc="IARINewD" then

new_var=syield/sqrt(20444.61);

if loc="Ludhiana" then

new_var=syield/sqrt(8441.56);

if loc="Navgaon" then

new_var=syield/sqrt(38391.71);

if loc="TERINewD" then

new_var=syield/sqrt(79710.63);

run;

* 4. To perform the combined analysis of the above data set considering the locations as fixed effects one can use the following SAS statements. ;

*To get the output in a CSV format file remove the star and define the path where to save the file;

*ODS csv FILE ='C:\Documents and Settings\owner\Desktop\new_old.csv';

*ODS SHOW;

proc glm data = tranformed;

class loc rep varn;

model new_var syield =loc rep(loc) varn loc*varn;

means varn /tukey;

run;

*ods csv close;

/*please note that for performing comparisons, transformed data (new_var) should only be used. However, for just having the original means, the analysis of syield should be seen*/

* 5. To perform the combined analysis of the above data set considering the locations as random effects one can perform the combined analysis as follows. ;

/* Analysis using PROC GLM*/

proc glm data = wholedata;

class loc rep varn;

model syield =loc rep(loc) varn loc*varn;

random loc loc*varn rep(loc)/test;

lsmeans varn/pdiff;

run;

*To get the output in a CSV format file remove the star and define the path where to save the file;

*ODS csv FILE ='C:\Documents and Settings\owner\Desktop\mixed.csv';

*ODS SHOW;

/*Analysis using Proc Mixed*/

proc mixed ratio covtest data = wholedata;

class loc rep varn;

model syield =varn ;

random loc loc*varn rep(loc)/s;

lsmeans varn/pdiff;

run;

*ods csv close;

* 6. Top prepare a Site Regression (SREG) or GGE Biplot. ;

/* this proc print statement prints the data set for means in MS_EXCEL format to be used in SREG biplot*/

ods html body = 'mean.xls';

proc print data = mean;

var loc varn Mean_syield;

run;

ods html close;

/* If loc*varn interaction is significant, then same genotype cannot be recommended for all locations. In such a situation, one can see the performance of genotype and genotype*environment interaction using SREG biplot*/

/* please see if incomplete block design has been used at all the locations, then one should use lsmeans instead of means*/

/* If some of the cells in genotype in environment table are missing, then one can obtain BLUP and use in place of lsmeans. A word of caution, no more than 20% of cells should be empty in Genotype � Environment Table*/

/*For SREG Biplot we create a data file named RAW where Locations are termed as ENV (environment here is location),

treatment numbers as GEN and means for gen as GYLD. We are using the program developed by Jose Crossa and his coworkers at CIMMYT, Mexico after some minor modifications.*/

OPTIONS PS = 5000 LS=78 NODATE;

/*after removing * one can get the output as a cgm file directly, which can be imported in PowerPoint or word documents for clarity. */

*FILENAME BIPLOT 'C:\Documents and Settings\owner\Desktop\comana.cgm'; *To have cgm files run it in BATCH;

*GOPTIONS DEVICE=CGMOF97L GSFNAME=BIPLOT GSFMODE=REPLACE;

/*one has to run the program twice, first time to see the portion of variation explained by two components in the output file, then one has to change the value of factor 1 and factor 2 in the file at appropriate place.*/

OPTIONS PS = 5000 LS=78 NODATE;

DATA RAW;

INPUT ENV $ GEN $ GYLD;

YLD=GYLD;

CARDS;

Bathinda 1 2129.66667

Bathinda 2 1589.33333

Bathinda 3 787.33333

Bathinda 4 1763.33333

Bathinda 5 2337.66667

Bathinda 6 1095.66667

Bathinda 7 1196.33333

Bathinda 8 1898.33333

Bathinda 9 1659

Bathinda 10 2029.33333

Bathinda 11 2685.33333

Bathinda 12 1211.66667

Hisar 1 3010.33333

Hisar 2 2415.33333

Hisar 3 875

Hisar 4 2608.33333

Hisar 5 2758

Hisar 6 1796.66667

Hisar 7 1607.66667

Hisar 8 2131.66667

Hisar 9 1741.66667

Hisar 10 2573

Hisar 11 2710.66667

Hisar 12 1647

IARINewD 1 2585

IARINewD 2 2948.33333

IARINewD 3 2555.66667

IARINewD 4 2422

IARINewD 5 2518.33333

IARINewD 6 2400

IARINewD 7 3037

IARINewD 8 3088.66667

IARINewD 9 2444.33333

IARINewD 10 2963.33333

IARINewD 11 2348.33333

IARINewD 12 2474

Ludhiana 1 1299.66667

Ludhiana 2 880

Ludhiana 3 879.66667

Ludhiana 4 1006

Ludhiana 5 1529.66667

Ludhiana 6 926

Ludhiana 7 710

Ludhiana 8 1681.66667

Ludhiana 9 966

Ludhiana 10 1623

Ludhiana 11 1448

Ludhiana 12 1247.66667

Navgaon 1 2225.66667

Navgaon 2 2462.66667

Navgaon 3 1852

Navgaon 4 3096.33333

Navgaon 5 2148

Navgaon 6 1741

Navgaon 7 1740.66667

Navgaon 8 3037

Navgaon 9 1574

Navgaon 10 3667

Navgaon 11 3148

Navgaon 12 2148

TERINewD 1 1740.33333

TERINewD 2 1648

TERINewD 3 1563

TERINewD 4 1726

TERINewD 5 1659

TERINewD 6 1922

TERINewD 7 1944

TERINewD 8 2028

TERINewD 9 2051.66667

TERINewD 10 1637.33333

TERINewD 11 684.66667

TERINewD 12 1414.66667

;

proc glm data=raw outstat=stats ;

class env gen;

model yld = env gen env*gen/ss4;

/*If this is required, then replace, MSE by the MSE in combined analysis, DFE with error degrees of freedom in combined analysis, NREP number of replications at each locations. */

data stats2;

set stats ;

drop _name_ _type_;

if _source_ = 'error' then delete;

mse=42626.4; * mse in combined analysis when locations are random;

dfe=132; * degrees of freedom in combined analysis;

nrep=3; * number of replications at each locations;

ss=ss*nrep;

ms=ss/df;

f=ms/mse;

prob=1-probf(f,df,dfe);

proc print data=stats2 noobs;

var _source_ df ss ms f prob;

proc glm data=raw noprint;

class env gen;

model yld = env / ss4 ;

output out=outres r=resid;

proc sort data=outres;

by gen env;

proc transpose data=outres out=outres2;

by gen;

id env;

var resid;

proc iml;

use outres2;

read all into resid;

ngen=nrow(resid);

nenv=ncol(resid);

use stats2;

read var {mse} into msem;

read var {dfe} into dfem;

read var {nrep} into nrep;

call svd (u,l,v,resid);

minimo=min(ngen,nenv);

l=l[1:minimo,];

ss=(l##2)*nrep;

suma=sum(ss);

porcent=((1/suma)#ss)*100;

minimo=min(ngen,nenv);

porcenta=0;

do i = 1 to minimo;

df=(ngen-1)+(nenv-1)-(2*i-1);

dfa=dfa//df;

porceacu=porcent[i,];

porcenta=porcenta+porceacu;

porcenac=porcenac//porcenta;

end;

dfe=j(minimo,1,dfem);

mse=j(minimo,1,msem);

ssdf=ss||porcent||porcenac||dfa||dfe||mse;

l12=l##0.5;

scoreg1=u[,1]#l12[1,];

scoreg2=u[,2]#l12[2,];

scoreg3=u[,3]#l12[3,];

scoree1=v[,1]#l12[1,];

scoree2=v[,2]#l12[2,];

scoree3=v[,3]#l12[3,];

factor1=max(abs(scoreg1||scoreg2));

factor2=max(abs(scoree1||scoree2));

factor=max(factor1,factor2);

scoreg=(scoreg1||scoreg2||scoreg3)*(1/factor);

scoree=(scoree1||scoree2||scoree3)*(1/factor);

scores=scoreg//scoree;

create sumas from ssdf;

append from ssdf;

close sumas;

create scores from scores;

append from scores ;

close scores;

data ss_sreg;

set sumas;

ss_sreg =col1;

porcent =col2;

porcenac=col3;

df_sreg =col4;

dfe =col5;

mse =col6;

drop col1 - col6;

ms_sreg=ss_sreg/df_sreg;

f_sreg=ms_sreg/mse;

probf=1-probf(f__sreg,df_sreg,dfe);

proc print data=ss_sreg noobs;

var ss_sreg porcent porcenac ;

proc sort data=raw;

by gen;

proc means data = raw noprint;

by gen ;

var yld;

output out = mediag mean=yld;

data nameg;

set mediag;

type = 'gen';

name = gen;

keep type name yld;

proc sort data=raw;

by env;

proc means data = raw noprint;

by env ;

var yld;

output out = mediae mean=yld;

data namee;

set mediae;

type = 'env';

name1 = 's'||env;

name = compress(name1);

keep type name yld;

data nametype;

set nameg namee;

data biplot ;

merge nametype scores;

dim1=col1;

dim2=col2;

dim3=col3;

drop col1-col3;

title1 'biplot of grain yield';

proc print data=biplot noobs;

var type name yld dim1 dim2 dim3;

data labels;

set biplot;

retain xsys '2' ysys '2' ;

length function text $8 ;

text = name ;

if type = 'GEN' then do;

color='red ';

size = 1.0;

style = 'hwcgm001';

x = dim1;

y = dim2;

if dim1 >=0

then position='5';

else position='5';

function = 'LABEL';

output;

end;

if type = 'ENV' then DO;

color='blue ';

size = 1.0;

style = 'hwcgm001';

x = 0.0;

y = 0.0;

function='MOVE';

output;

x = dim1;

y = dim2;

function='DRAW' ;

output;

if dim1 >=0

then position='6';

else position='4';

function='LABEL';

output;

end;

/*one has to run the program twice, first time to see the portion of variation explained by two components,

then change in the file at appropriate places for the factor 1 and factor 2 */

Proc gplot data=biplot;

Plot dim2*dim1 / Annotate=labels frame

Vref=0.0 Href = 0.0

cvref=black chref=black

lvref=3 lhref=3

vaxis=axis2 haxis=axis1

vminor=1 hminor=1 nolegend;

symbol1 v=none c=black h=0.7 ;

symbol2 v=none c=black h=0.7 ;

axis2

length = 5.0 in

order = (-1 to 1.0 by 0.2)

/*one has to change the value for factor 2(.)*/

label=(f=hwcgm001 c=green h=1.2 a=90 r=0 'Factor 2 (15.25%)') /*please change the percent variation explained as per data*/

offset = (3)

value=(h=1.0)

minor=none;

* length = 7.0 in FOR CGM files;

axis1

length = 7.0 in

order = (-0.8 to 1.0 by 0.2)

/*one has to change the value for factor 1(.)*/

label=(f=hwcgm001 c=green h=1.2 'Factor 1 (66.91%)') /*please change the percent variation explained as per data*/

offset = (3)

value=(h=1.0)

minor=none;

* length = 7.0 in FOR CGM files;

Title1 f=hwcgm001 c=Red h=2.0 'SREG biplot of the Grain Yield of Quality_zone2 at 6 Locations';

/*Give the title as is required in output*/

run;

Data File

Result File

SREG Biplot

<<Back

Home Descriptive Statistics Tests of Significance Correlation and Regression Completely Randomised Design RCB Design

Incomplete Block Design Resolvable Block Design Augmented Design Latin Square Design Factorial RCB Design

Partially Confounded Design Factorial Experiment with Extra Treatments Split Plot Design Strip Plot Design

Response Surface Design Cross Over Design Analysis of Covariance Diagnostics and Remedial Measures

Principal Component Analysis Cluster Analysis Groups of Experiments Non-Linear Models

Contact Us

Descriptive Statistics

Tests of Significance

Correlation and Regression

Completely Randomised Design

RCB Design

Incomplete Block Design

Resolvable Block Design

Augmented Design

Latin Square Design

Factorial RCB Design

Partially Confounded Design

Factorial Experiment with Extra Treatments

Split Plot Design

Strip Plot Design

Response Surface Design

Cross Over Design

Diagnostics and Remedial Measures

Principal Component Analysis

Cluster Analysis

Groups of Experiments

Non-Linear Models

Contact Us

Other Designed Experiments
(Under Development)

For exposure on SAS, SPSS,

MINITAB, SYSTAT and

MS-EXCEL for analysis of

data from designed experiments:

Please see Module I of Electronic Book II: Advances in Data Analytical Techniques

available at Design Resource Server (www.iasri.res.in/design)