Java Mailing List Archive

http://www.r-help.com/

Home » Home (12/2007) » R Help for Statistical Computing »

Re: [R] Timings of function execution in R [was Re: R in Industry]

Douglas Bates

2007-02-09

Replies:

On 2/9/07, Prof Brian Ripley <ripley@(protected):
> The other reason why pmin/pmax are preferable to your functions is that
> they are fully generic. It is not easy to write C code which takes into
> account that <, [, [<- and is.na are all generic. That is not to say that
> it is not worth having faster restricted alternatives, as indeed we do
> with rep.int and seq.int.
>
> Anything that uses arithmetic is making strong assumptions about the
> inputs. It ought to be possible to write a fast C version that worked for
> atomic vectors (logical, integer, real and character), but is there
> any evidence of profiled real problems where speed is an issue?

Yes. I don't have the profiled timings available now and one would
need to go back to earlier versions of R to reproduce them but I did
encounter a situation where the bottleneck in a practical computation
was pmin/pmax. The binomial and poisson families for generalized
linear models used pmin and pmax to avoid boundary conditions when
evaluating the inverse link and other functions. When I profiled the
execution of some generalized linear model and, more importantly for
me, generalized linear mixed model fits, these calls to pmin and pmax
were the bottleneck. That is why I moved some of the calculations for
the binomial and poisson families in the stats package to compiled
code.

In that case I didn't rewrite the general form of pmin and pmax, I
replaced specific calls in the compiled code.

>
> On Fri, 9 Feb 2007, Martin Maechler wrote:
>
> >>>>>> "Ravi" == Ravi Varadhan <rvaradhan@(protected)>
> >>>>>>   on Thu, 8 Feb 2007 18:41:38 -0500 writes:
> >
> >   Ravi> Hi,
> >   Ravi> "greaterOf" is indeed an interesting function. It is much faster than the
> >   Ravi> equivalent R function, "pmax", because pmax does a lot of checking for
> >   Ravi> missing data and for recycling. Tom Lumley suggested a simple function to
> >   Ravi> replace pmax, without these checks, that is analogous to greaterOf, which I
> >   Ravi> call fast.pmax.
> >
> >   Ravi> fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}
> >
> >   Ravi> Interestingly, greaterOf is even faster than fast.pmax, although you have to
> >   Ravi> be dealing with very large vectors (O(10^6)) to see any real difference.
> >
> > Yes. Indeed, I have a file, first version dated from 1992
> > where I explore the "slowness" of pmin() and pmax() (in S-plus
> > 3.2 then). I had since added quite a few experiments and versions to that
> > file in the past.
> >
> > As consequence, in the robustbase CRAN package (which is only a bit
> > more than a year old though), there's a file, available as
> > https://svn.r-project.org/R-packages/robustbase/R/Auxiliaries.R
> > with the very simple content {note line 3 !}:
> >
> > -------------------------------------------------------------------------
> > ### Fast versions of pmin() and pmax() for 2 arguments only:
> >
> > ### FIXME: should rather add these to R
> > pmin2 <- function(k,x) (x+k - abs(x-k))/2
> > pmax2 <- function(k,x) (x+k + abs(x-k))/2
> > -------------------------------------------------------------------------
> >
> > {the "funny" argument name 'k' comes from the use of these to
> > compute Huber's psi() fast :
> >
> > psiHuber <- function(x,k) pmin2(k, pmax2(- k, x))
> > curve(psiHuber(x, 1.35), -3,3, asp = 1)
> > }
> >
> > One point *is* that I think proper function names would be pmin2() and
> > pmax2() since they work with exactly 2 arguments,
> > whereas IIRC the feature to work with '...' is exactly the
> > reason that pmax() and pmin() are so much slower.
> >
> > I've haven't checked if Gabor's
> >   pmax2.G <- function(x,y) {z <- x > y; z * (x-y) + y}
> > is even faster than the abs() using one.
> > It may have the advantage of giving *identical* results (to the
> > last bit!) to pmax() which my version does not --- IIRC the
> > only reason I did not follow my own 'FIXME' above.
> >
> > I had then planned to implement pmin2() and pmax2() in C code, trivially,
> > and and hence get identical (to the last bit!) behavior as
> > pmin()/pmax(); but I now tend to think that the proper approach is to
> > code pmin() and pmax() via .Internal() and hence C code ...
> >
> > [Not before DSC and my vacations though!!]
> >
> > Martin Maechler, ETH Zurich
> >
> > ______________________________________________
> > R-help@(protected)
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> --
> Brian D. Ripley,            ripley@(protected)
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,         Tel: +44 1865 272861 (self)
> 1 South Parks Road,              +44 1865 272866 (PA)
> Oxford OX1 3TG, UK           Fax: +44 1865 272595
>

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
©2008 r-help.com - Jax Systems, LLC, U.S.A.