Options for HTML scraping? [closed]
"Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 8 years ago.
Improve this question
I'm thinking of trying Beautiful Soup, a Python package for HTML scraping. Are there any other HTML scraping packages I should be looking at? Python is not a requirement, I'm actually interested in hearing about other languages as well. The story so far:
Python
Beautiful Soup lxml HTQL Scrapy Mechanize
Ruby
Nokogiri Hpricot Mechanize scrAPI scRUBYt! wombat Watir
.NET
Html Agility Pack WatiN
Perl
WWW::Mechanize Web-Scraper
Java
Tag Soup HtmlUnit Web-Harvest [jARVEST] 21 jsoup Jericho HTML Parser
JavaScript
request cheerio artoo node-horseman phantomjs
PHP
[Goutte] 29 htmlSQL PHP Simple HTML DOM Parser PHP Scraping with CURL ScarletsQuery
Go
goquery Dataflow kit
Most of them
Screen-Scraper"
|