Pages

2008-03-26

Perl - HTML Processing with Tokens

"Regular expressions are powerful, but they're a painfully low-level way of dealing with HTML. You're forced to worry about spaces and newlines, single and double quotes, HTML comments, and a lot more. The next step up from a regular expression is an HTML tokenizer. In this chapter, we'll use HTML::TokeParser to extract information from HTML files. Using these techniques, you can extract information from any HTML file, and never again have to worry about character-level trivia of HTML markup."

http://lwp.interglacial.com/ch07_01.htm

No comments: