grep for pdfs
Did you ever miss the functionality to perform a full text search in multiple pdf files from the command line in linux?
With the linux command grep one can search for a given text in multiple files. If you don´t know it already you can find some information about grep here. Sadly it can not be used for searching in pdf files, which is certainly an important task. Imagine you have some thousand pdf files archived on your harddrive and you are looking fore some information contained in them. It is far to much work to open each of them in your pdf viewer and search for the needed information. In this situation a tool like grep is quite handy.
A few days ago I found the interesting tool pdfgrep. It works similar to grep, but can search in pdf files. You can download it from SourceForge. Then build pdfgrep from source.
For gentoo users, as usual there is a more easy way. I wrote a simple ebuild for pdfgrep. You can download the ebuild here: [download#41]
To use the ebuild, just copy it to/usr/local/portage/app-text/pdfgrep/. You probably have to create the directory. Then run
ebuild /usr/local/portage/app-text/pdfgrep/pdfgrep-1.1.ebuild digest
Be sure to include the following line in your /etc/make.conf.
PORTDIR_OVERLAY=”/usr/local/portage”
Afterwards just emerge pdfgrep.
Sadly pdfgrep is not capable of recursively searching complete directory structures like one can do with egrep -r. This would enable one to search complete pdf collections. Not a big problem. Just use the following line of code:
find -name “*.pdf” -exec pdfgrep -C50 -Hni $1 ‘{}’ ‘;’
For convenient use place it into a script file:
echo “find -name \”*.pdf\” -exec pdfgrep -C50 -Hni \$1 ‘{}’ ‘;'” > /usr/local/bin/pdfrgrep
And make it executable:
chmod +x /usr/local/bin/pdfrgrep
Now you can just cd to the directory of your pdf collection and search it by entering:
pdfgrep [searchterm]
Regards
Jürgen