Home » Questions » Computers [ Ask a new question ]

Save parts of a website as pure text

Save parts of a website as pure text

I hope I may ask this here.

Asked by: Guest | Views: 302
Total answers/comments: 2
Guest [Entry]

"Your best bet would be to build your own toolchain for this:

Use a tool such as wget to recursively download the HTML files from which content is needed. Pay special attention to options -r to specify recursive downloading, and -l to specify depth of the recursion. wget outputs plain text.
Use a tool such as grep to filter out everything except the line(s) containing the <DIV> you need. Pay special attention to options -r to specify recursive searching, and -e to specify a regular expression. Pipe grep's output to a file of your choice. grep outputs plain text if it is fed plain text.

Hint: It may be simpler to use grep multiple times to filter out things in smaller chunks. This depends entirely on how similar all of the various pages are, and how clean the code is.

Edit: Then again, perhaps using a regex is not a good way to parse HTML."
Guest [Entry]

I'm lazy. In the time it would take you to research and set up a special-purpose tool, surely you can just highlight the required text with a mouse, copy it, and paste it into a text editor?