Java Mailing List Archive

http://www.r-help.com/

Home » Home (12/2007) » R Help for Statistical Computing »

Re: [R] how to get "lsmeans"?

Frank E Harrell Jr

2007-03-21

Replies:

Chuck Cleland wrote:
> Liaw, Andy wrote:
>> I verified the result from the following with output from JMP 6 on the
>> same data (don't have SAS: don't need it):
>>
>> set.seed(631)
>> n <- 100
>> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
>>             B=factor(sample(1:2, n, replace=TRUE)),
>>             C=factor(sample(1:2, n, replace=TRUE)),
>>             d=rnorm(n))
>> fm <- lm(y ~ A + B + C + d, dat)
>> ## Form a data frame of points to predict: all combinations of the
>> ## three factors and the mean of the covariate.
>> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
>> p[] <- lapply(p, factor)
>> p <- cbind(p, d=mean(dat$d))
>> p <- cbind(yhat=predict(fm, p), p)
>> ## lsmeans for the three factors:
>> with(p, tapply(yhat, A, mean))
>> with(p, tapply(yhat, B, mean))
>> with(p, tapply(yhat, C, mean))

And with the Design package you can do:

f <- ols(y ~ ...)
ds <- gendata(A=c('1','2'),B=c('1','2'),C=c('1','2'))
predict(f, ds)

But this sets the covariate to the median (nothing unreasonable about
that, just will not agree with SAS). To set to mean add d=mean(dat$d)
in gendata.

Frank

>
>  Using Andy's example data, these are the LSMEANS and intervals I get
> from SAS:
>
> A     y LSMEAN    95% Confidence Limits
> 1     -0.071847     -0.387507   0.243813
> 2     -0.029621     -0.342358   0.283117
>
> B     y LSMEAN    95% Confidence Limits
> 1     -0.104859     -0.397935   0.188216
> 2     0.003391     -0.333476   0.340258
>
> C     y LSMEAN    95% Confidence Limits
> 1     -0.084679     -0.392343   0.222986
> 2     -0.016789     -0.336374   0.302795
>
>  One way of reproducing the LSMEANS and intervals from SAS using
> predict() seems to be the following:
>
>> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
>> newdat <- expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
>> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
>  A  B  C       d      fit     lwr     upr
> 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
> 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
>
>  However, another possibility seems to be:
>
>> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
>> newdat <-
> expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d))
>> cbind(newdat, predict(dat.lm, newdat, interval="confidence"))
>  A   B   C       d      fit     lwr     upr
> 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
> 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
>
>  The predictions directly above match what effect() in the effects
> package by John Fox returns:
>
> library(effects)
>
>> effect("A", fm, xlevels=list(d = mean(dat$D)))
>
> A effect
> A
>       1       2
> -0.08078243 -0.03855619
>
>  But for some reason the predict() and effect() intervals are a little
> different:
>
>> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower
>       [,1]
> 101 -0.3924451
> 102 -0.3440179
>
>> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper
>       [,1]
> 101 0.2308802
> 102 0.2669055
>
>  I would be interested in any comments on these different approaches
> and on the difference in intervals returned by predict() and effect().
>
> hope this helps,
>
> Chuck
>
>> Andy
>>
>> From: Xingwang Ye
>>> Dear all,
>>>    
>>>   I search the mail list about this topic and learn that no
>>> simple way is available to get "lsmeans" in R as in SAS.
>>>   Dr.John Fox and Dr.Frank E Harrell have given very useful
>>> information about "lsmeans" topic.  
>>>   Dr. Frank E Harrell suggests not to think about lsmeans,
>>> just to think about what predicted values wanted
>>>   and to use the predict function. However, after reading
>>> the R help file for a whole day, I am still unclear how to do it.
>>>   Could some one give me a hand?
>>>
>>> for example:
>>>  
>>> A,B and C are binomial variables(factors); d is a continuous
>>> variable ; The response variable Y is a continuous variable too.
>>>
>>> To get lsmeans of Y according to A,B and C, respectively, in
>>> SAS, I tried proc glm data=a; class A B C; model Y=A B C d;
>>> lsmeans A B C/cl; run;
>>>
>>> In R, I tried this:
>>> library(Design)
>>> ddist<-datadist(a)
>>> options(datadist="ddist")
>>> f<-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE)
>>>
>>> then how to get the "lsmeans" for A, B, and C, respectively
>>> with predict function?
>>>
>>>
>>>
>>> Best wishes
>>> yours, sincerely
>>> Xingwang Ye  
>>> PhD candidate  
>>> Research Group of Nutrition Related Cancers and Other Chronic
>>> Diseases    
>>> Institute for Nutritional Sciences,
>>> Shanghai Institutes of Biological Sciences,  
>>> Chinese Academy of Sciences  
>>> P.O.Box 32  
>>> 294 Taiyuan Road  
>>> Shanghai 200031  
>>> P.R.CHINA
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Notice: This e-mail message, together with any attachments,...{{dropped}}
>>
>> ______________________________________________
>> R-help@(protected)
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>


--
Frank E Harrell Jr  Professor and Chair       School of Medicine
              Department of Biostatistics  Vanderbilt University

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
©2008 r-help.com - Jax Systems, LLC, U.S.A.