Java Mailing List Archive

http://www.r-help.com/

Home » Home (12/2007) » R Help for Statistical Computing »

Re: [R] read.fwf and header

Marc Schwartz

2006-10-30

Replies:

On Mon, 2006-10-30 at 19:51 +0100, Gregor Gorjanc wrote:
> Hi!
>
> I have data (also in attached file) in the following form:
>
> num1 num2 num3 int1 fac1 fac2 cha1 cha2 Date POSIXt
> 1           1  f q  1900-01-01 1900-01-01 01:01:01
> 2 1.0 1316666.5 2 a g r z        1900-01-01 01:01:01
> 3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01
> 4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01
> 5 2.5 829737.4   d j u w 1900-01-01
> 6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01
> 7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01
> 8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01
> 9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01
> 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01
>   5.5 988481.4 10 j   q 1900-01-01 1900-01-01 01:01:01
>
> This is a FWF (fixed width format) file. I can not use read.table here,
> because of missing values. I have tried with the following
>
> > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
> header=TRUE)
>
> Error in read.table(file = FILE, header = header, sep = sep, as.is =
> as.is, :
>  more columns than column names
>
> I could use:
>
> > read.fwf(file="test.txt", widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
> header=FALSE, skip=1)
>   V1 V2     V3 V4 V5 V6 V7 V8       V9           V10
> 1  1 NA     NA 1   f q   1900-01-01 1900-01-01 01:01:01
> 2  2 1.0 1316666.5 2 a g r z         1900-01-01 01:01:01
> 3  3 1.5 1188830.5 3 b h s y 1900-01-01 1900-01-01 01:01:01
> 4  4 2.0 1271846.3 4 c i t x 1900-01-01 1900-01-01 01:01:01
> 5  5 2.5 829737.4 NA d j u w 1900-01-01
> 6  6 3.0 1240967.3 5 e k v v 1900-01-01 1900-01-01 01:01:01
> 7  7 3.5 919684.4 6 f l w u 1900-01-01 1900-01-01 01:01:01
> 8  8 4.0 968214.6 7 g m x t 1900-01-01 1900-01-01 01:01:01
> 9  9 4.5 1232076.4 8 h n y s 1900-01-01 1900-01-01 01:01:01
> 10 10 5.0 1141273.4 9 i o z r 1900-01-01 1900-01-01 01:01:01
> 11 NA 5.5 988481.4 10 j     q 1900-01-01 1900-01-01 01:01:01
>
> Does anyone have a clue, how to get above result with header?
>
> Thanks!

The attachment did not come through. Perhaps it was too large?

Not sure if this is the most efficient way, but how about this:

DF <- read.fwf("test.txt",
          widths=c(3, 4, 10, 3, 2, 2, 2, 2, 11, 20),
          skip = 1, strip.white = TRUE,
          col.names = read.table("test.txt",
                         nrow = 1, as.is = TRUE)[1, ])


> DF
 num1 num2    num3 int1 fac1 fac2 cha1 cha2     Date
1   1  NA     NA   1      f   q    1900-01-01
2   2 1.0 1316666.5   2   a   g   r   z      
3   3 1.5 1188830.5   3   b   h   s   y 1900-01-01
4   4 2.0 1271846.3   4   c   i   t   x 1900-01-01
5   5 2.5 829737.4  NA   d   j   u   w 1900-01-01
6   6 3.0 1240967.3   5   e   k   v   v 1900-01-01
7   7 3.5 919684.4   6   f   l   w   u 1900-01-01
8   8 4.0 968214.6   7   g   m   x   t 1900-01-01
9   9 4.5 1232076.4   8   h   n   y   s 1900-01-01
10  10 5.0 1141273.4   9   i   o   z   r 1900-01-01
11  NA 5.5 988481.4  10   j         q 1900-01-01
          POSIXt
1 1900-01-01 01:01:01
2 1900-01-01 01:01:01
3 1900-01-01 01:01:01
4 1900-01-01 01:01:01
5           <NA>
6 1900-01-01 01:01:01
7 1900-01-01 01:01:01
8 1900-01-01 01:01:01
9 1900-01-01 01:01:01
10 1900-01-01 01:01:01
11 1900-01-01 01:01:01


Of course, with the limited number of columns, you can always just set

colnames(DF) <- c("num1", "num2", "num3", "int1", "fac1",
           "fac2", "cha1", "cha2", "Date", "POSIXt")

as a post-import step.

HTH,

Marc Schwartz

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
©2008 r-help.com - Jax Systems, LLC, U.S.A.