Java Mailing List Archive

http://www.r-help.com/

Home » Home (12/2007) » R Help for Statistical Computing »

[R] integer codes of factors

Mike R

2005-07-14

Replies:

U = c("b", "b", "b", "c", "d", "e", "e")

F1 = factor( U, levels=c("a", "b", "c", "d", "e") )

as.numeric(F1)
[1] 2 2 2 3 4 5 5

Here, the integer code of "b" in F1 is 2

K = factor( levels(F1) )
as.numeric(K)
[1] 1 2 3 4 5
K
[1] a b c d e
Levels: a b c d e

And again, the integer code of "b" in K is 2. Great!

I am wondering how modify that usage such that the correspondence between
the two numeric vectors can this be trusted. for example, the correspondence
can be corrupted by placing the "a" at the end:

F2 = factor( U, levels=c("b", "c", "d", "e", "a") )

as.numeric(F2)
[1] 1 1 1 2 3 4 4

Placing the "a" at the end changed the integer code of "b" in F2 to 1, which is
not a problem. But ......

K = factor( levels(F2) )
as.numeric( K )
[1] 2 3 4 5 1
K
[1] b c d e a
Levels: a b c d e

But the integer code of "b" in K is now 2, which does not correspond to its code
in F2.

One would think that ordered=TRUE ought to avoid the corruption, but it does not
seem to accomplish that:

K = factor( levels(F2), ordered=TRUE )
as.numeric(K)
[1] 2 3 4 5 1
K
[1] b c d e a
Levels: a < b < c < d < e

But the integer code of "b" in K is still 2.

However, corruption can be avoided with this idiom:

K = factor( levels(F2), levels=levels(F2) )
as.numeric(K)
[1] 1 2 3 4 5
K
[1] "b" "c" "d" "e" "a"
Levels: b c d e a

Now the integer code of "b" in K is 1, which, as desired, is in
correspondence with
its code in F2.

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
©2008 r-help.com - Jax Systems, LLC, U.S.A.