Java Mailing List Archive

http://www.r-help.com/

Home » R Help for Statistical Computing »

Re: [R] lokern package

Martin Maechler

2009-07-02

Replies: Find Java Web Hosting

Author LoginPost Reply
>>>>> "RV" == Ravi Varadhan <RVaradhan@(protected)>
>>>>>   on Thu, 2 Jul 2009 10:51:03 -0400 writes:

  RV> Dear Martin, I have been playing a lot with the
  RV> glkerns() function in the "lokern" package for
  RV> "automatic" smoothing of time-series data. This kernel
  RV> smoothing approach of Gasser and Mueller seems to
  RV> perform quite well for estimating the function and its
  RV> derivatives (first and second derivatives). In fact,
  RV> this is one of the best methods based on my simulation
  RV> studies for comparing a number of "automatic" bandwidth
  RV> selection methods.

That's quite interesting!
What methods did you use in your comparison?
[I assume you did use smooth.spline() as well]

  RV> I am interested in applying this to
  RV> automatic smoothing and feature extraction for a "large"
  RV> number (thousands) of time series, with hundreds of
  RV> points per time series. This is where I am interested
  RV> in seeing if the efficiency of "glkerns" can be
  RV> improved.

  RV> Here follows my specific question:

  RV> You have to call glkerns() separately each time to
  RV> compute the function and its derivatives, ie. if I want
  RV> the function and its first 2 derivatives, I have to make
  RV> 3 calls to glkerns().

Yes. Note however that each of these calls chooses a different
kernel order ('korder') by default { korder <- deriv + 2 },
and uses *different* automatic bandwidths depending on both
deriv and korder.

Consequently, the result of     glkerns(x,y, deriv=1) is
*not* the derivative of the estimate glkerns(x,y, deriv=0)
but rather an independent estimate of the derivative of the
unknown g(). The same applies for deriv=2 etc.

But the problem is even "worse": Let's assume you get the "optimal" global
bandwidth estimate 'bandwidth' = h from the deriv=0 case.
Then you could set this bandwidth explicitly for the deriv=1 and
deriv=2 calls {and you'd gain quite a bit of execution time !}.
*HOWEVER*, as the internal codes absolutely require

  k_{ord} - nu == korder - nu to be an *even* number,

you can *not* keep 'korder' fixed together with 'bandwidth'.
But if you change 'korder', the kernels change and the meaning
of the bandwidth has changed too.

I have just uploaded version 1.0-8 of package 'lokern' to CRAN,
and this now contains a demo("gkl-derivs") which defines a utility function
to play a bit with this (keeping bandwidth fixed).

  RV> This seems to me to be inefficient, especially for large
  RV> time series. In smooth.spline(), for example, you can
  RV> call it once to get the fit, and then use the fitted
  RV> object to compute the derivatives using predict().

Note that smooth.spline() is very different:
If you use deriv=1 or deriv=2 in the predict() call,
you get exactly the 1st or 2nd derivative of the same smoothing
spline function.

  RV> Is it possible to have a similar feature in glkerns?

As I have explained, this is not easily possible currently
more for conceptual reasons than others.

In principle it should be possible however,
but that would both need some "theoretical" work
{maybe just collecting the papers and formulae used} and
then some Fortran code digging and shuffling.

Would be a nice "semester work" for a student...
Martin

--
Martin <Maechler@(protected)
Seminar für Statistik, ETH Zürich HG G 16    Rämistrasse 101
CH-8092 Zurich, SWITZERLAND
phone: +41-44-632-3408     fax: ...-1228    <><


  RV> Thanks for any suggestions.

  RV> Ravi.
  RV> ----------------------------------------------------------------------------
  RV> -------

  RV> Ravi Varadhan, Ph.D.

  RV> Assistant Professor, The Center on Aging and Health

  RV> Division of Geriatric Medicine and Gerontology

  RV> Johns Hopkins University

  RV> Ph: (410) 502-2619

  RV> Fax: (410) 614-9625

  RV> Email: rvaradhan@(protected)

  RV> Webpage:
  RV> http://www.jhsph.edu/agingandhealth/People/Faculty_personal_pages/Varadhan.h
  RV> tml



  RV> ----------------------------------------------------------------------------
  RV> --------



  RV>  [[alternative HTML version deleted]]

  RV> ______________________________________________
  RV> R-help@(protected)
  RV> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
  RV> read the posting guide
  RV> http://www.R-project.org/posting-guide.html and provide
  RV> commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
©2008 r-help.com - Jax Systems, LLC, U.S.A.