Home » Questions » Computers [ Ask a new question ]

Parsing text files

Parsing text files

I encountered a situation tonight where I wanted to parse a text file. I had a very, very long word list that contained English words delimited by lines. I wanted to get rid of every word (or line) that was longer than 7 characters. This would be simple in Linux but I can't seem to find a simple solution in Windows XP. I tried using Notepad++ regular expression search, but that was a huge failure. I tried using the expression .{6,} without finding any matches. I'm really at a loss because I thought this sort of thing would be extremely easy and there would be tons of tools to accomplish a task like this. It seems like Notepad++ supports every other feature in the world except the very basic ones that seem the most obvious.

Asked by: Guest | Views: 55
Total answers/comments: 5
Guest [Entry]

"To add the SQL text, you could try this command prompt one liner:

(for /f %i in (words.txt) do @echo INSERT INTO Words ^(word^) VALUES ^('%i'^)) > words.sql

To filter out lines in a text file longer than 7 characters, you could use another command line tool, findstr:

findstr /v /r ^.........*$ words.txt > shorter-words.txt

The /r option specifies that you want to use regex matching, and the /v option tells it to print lines that do not match. (Since it appears that findstr doesn't allow you to specify a character count range, I faked it with the ""8 or more"" pattern and the ""do not match"" option.)"
Guest [Entry]

"You can get the GNUWin32 sed for Windows XP.
Similarly AWK and Perl too.
That is if you are used to Unix scripting (if so also consider Cygwin).

Otherwise there is also PowerShell."
Guest [Entry]

"Maybe this is better suited for StackOverflow, because the best advice I can give you is to learn one of the scripting languages to make such tasks easier. It's much better to know one powerful tool than dozens of little ones, IMHO, and it's an investment that pays off.

Downloading Python and going through the tutorial will take a few hours, but afterwards such tasks will seem very easy to you. Better yet, you will learn to recognize tasks ""looking for some programming"" in other fields as well, and it will increase your productivity tenfold."
Guest [Entry]

"I would use TextPad for this.

I've used it extensively for regular expressions in the past.

I'd try finding something like:

^[[:alpha:]]{7,}\n

And replacing with nothing."
Guest [Entry]

"Your expression is wrong. You want this:

^.{0,6}$"