Home » Questions » Computers [ Ask a new question ]

How do I search for a string in a PHP file using `grep`?

How do I search for a string in a PHP file using `grep`?

I am searching for a class declaration on a site with hundreds of PHP files. How can I do this in the current folder and subfolders using grep?

Asked by: Guest | Views: 469
Total answers/comments: 4
bert [Entry]

"If you use -r, you probably want to recursively search just the current directory:

grep -r 'class MyClass' .

Note the period on the end of the above.

What you told grep was that you wanted it to recursively search every *.php file or directory, but you likely didn't have any directories that ended in a .php extension. The above is the simplest possible way to do the search but also includes files of other types which is especially problematic when you have .svn directories everywhere. If you don't have a lot of other file types, the above is generally sufficient and works well.

Most incarnations of grep don't have a way to specify file extensions that can be searched, so you generally use find in conjunction with it:

find <startDir> -iname '*.php' -exec grep 'class[ \t]*MyClass' {} \;

-iname tells some versions of grep that you want to do a case-insensitive filename search, but many of the non GNU variants of find don't support it, so you could use -name instead.

The -exec option above has the side effect of invoking grep for every file, which is fairly costly.

Still other versions of grep support a + which tells find to append the arguments together to reduce the number of invocations of the executable:

find <startDir> -iname '*.php' -exec grep 'class[ \t]*MyClass' {} + \;

A frequently recommended way to reduce invocations is to use xargs which will fire off grep as few times as possible but as many times as necessary:

find <startDir> -iname \*.php | xargs grep ""class[ \t]*MyClass""

xargs is smart enough to pass grep up to the maximum number of arguments supported on the command line, but the above variation doesn't handle certain characters like spaces very well. For example, were 'the file.php' a filename grep would receive 'the' as an argument and 'file.php' as an argument so both wouldn't be found. Instead we use:

find <startDir> -iname \*.php -print0 | xargs -0 grep ""class[ \t]*MyClass""

print0 and -0 work together and use arguments followed by a null character so that the argument is completely and unambiguously identified.

f you ever have more than one space after class, or a tab, you might want to allow more characters by changing the regex: class[ \t]*MyClass

This post on StackOverflow has other examples and demonstrates how to exclude certain directories, like .svn directories."
bert [Entry]

"In case the matched files might have arbitrary names, you should consider using -print0

find . -type f -name '*.php' -print0 | xargs -0 grep 'class MyClass'"
"In case the matched files might have arbitrary names, you should consider using -print0

find . -type f -name '*.php' -print0 | xargs -0 grep 'class MyClass'"
bert [Entry]

"grep -rl 'class MyClass' . --include \*.php

l shows only filename

r is recurive

. searches in current folder

--include limits filename extension"
bert [Entry]

"Probably you want case insensitivity and whitespace tolerance, and grep will terminate if it doesn't find any instances of the desired file pattern in the current directory. It needs to know where to start, as it were, and no files matched produces no starting path.

If there is at least one file of the desired extension, then you can use egrep -ir. find and xargs show their power with single flat (but very big) directories that grep fails on and extra qualifiers (for instance if you want to search for .inc, .php, and php3 files only). But it loses quite a bit of convenience and speed in my experience. For human written class declarations a big problem is going to be whitespace. So use egrep instead of grep. Also LC_ALL=C for extra speed. For convenience, I often use:

LC_ALL=C egrep -irn ""class[ ]*myclass([ ]|\n)"" * | egrep ""\.(inc|php[35]?)""

-i -- case insensitive
-r -- recursive for all file pattern (here *)
-n -- include the line number
LC_ALL=C -- search in ASCII, not in utf8, this is much faster.

[ ]* -- match any number of spaces before the class name
([ ]|\n) -- match a space or newline after the classname

This can still match comments, such as // class myclass exists in file, but I find those to be small relatively, and they can be filtered out with ... | fgrep -v '//'

You can also include more complex file masks (for instance, to catch most .inc and .php files) on the egrep file pattern like so:

egrep ""pattern"" *.[ip]*

That (with the first options) will be quite fast and mostly limited to php files.

HTH"