Java Mailing List Archive

http://www.r-help.com/

Home » R Help for Statistical Computing »

[R] parse an HTML page with verbose error message (using XML)

Yihui Xie

2010-03-11

Replies: Find Java Web Hosting

Author LoginPost Reply
I'm using the function htmlParse() in the XML package, and I need a
little bit help on error handling while parsing an HTML page. So far I
can use either the default way:

# error = xmlErrorCumulator(), by default
library(XML)
doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/")
# the error message is:
# htmlParseStartTag: invalid element name

or the tryCatch() approach:

# error = NULL, errors to be caught by tryCatch()
tryCatch({
  doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/",
    error = NULL)
}, XMLError = function(e) {
  cat("There was an error in the XML at line", e$line, "column",
    e$col, "\n", e$message, "\n")
})
# verbose error message as:
# There was an error in the XML at line 90 column 2
# htmlParseStartTag: invalid element name

I wish to get the verbose error messages without really stopping the
parsing process; the first approach cannot return detailed error
messages, while the second one will stop the program...

Thanks!

Regards,
Yihui
--
Yihui Xie <xieyihui@(protected)>
Phone: 515-294-6609 Web: http://yihui.name
Department of Statistics, Iowa State University
3211 Snedecor Hall, Ames, IA

______________________________________________
R-help@(protected)
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
©2008 r-help.com - Jax Systems, LLC, U.S.A.