grepping large amounts of text

grepping large amounts of text I have a few gigabytes of source code.
Asked by: Guest \| Views: 144

Total answers/comments: 3

Comments display order:

Guest [Entry]

The only way you'll get a significant improvement over grep is to use an indexed search system like Strigi. The filesystem makes very little difference unless you have a huge number of very small files.

Guest [Entry]

"Here is what I understand -

You are searching source code for a term
You'd like to see which source files use that term
You probably have thousands of files (adding up to GBs)
Do you want to know all the occurrences of 'term' within each file or a yes/no indication of whether its been used in a file or not? (the -l flag does this).

You can use the policy of divide-and-rule. Partition your set into multiple file-sets, run multiple greps parallely.

Not sure if your need is a one-off thing or something repetitive in nature."

Guest [Entry]

"If you only need to grep a subset of files then use find first. For example to only grep .h header files:

find path/to/source -name *.h -print0 | xargs -0 grep pattern

This will be faster since you're only accessing filenames most of the time, rather than file contents, which means many fewer disc accesses."