Java Mailing List Archive

http://www.r-help.com/

Home » Home (12/2007) » R Help for Statistical Computing »

[R] read large amount of data

WeiWei Shi

2005-07-18

Replies:

Hi,
I have a dataset with 2194651x135, in which all the numbers are 0,1,2,
and is bar-delimited.

I used the following approach which can handle 100,000 lines:
t<-scan('fv', sep='|', nlines=100000)
t1<-matrix(t, nrow=135, ncol=100000)
t2<-t(t1)
t3<-as.data.frame(t2)

I changed my plan into using stratified sampling with replacement (col
2 is my class variable: 1 or 2). The class distr is like:
awk -F\| '{print $2}' fv | sort | uniq -c
2162792 1
31859 2

Is it possible to use R to read the whole dataset and do the
stratified sampling? Is it really dependent on my memory size?
Mem:  3111736k total, 1023040k used, 2088696k free,  150160k buffers
Swap: 4008208k total,   19040k used, 3989168k free,  668892k cached


Thanks,

weiwei

--
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
©2008 r-help.com - Jax Systems, LLC, U.S.A.