Home » Questions » Computers [ Ask a new question ]

How can I convert scanned images as PDF to a searchable PDF file? [closed]

How can I convert scanned images as PDF to a searchable PDF file? [closed]

I have a PDF of a scanned book.

Asked by: Guest | Views: 268
Total answers/comments: 4
Guest [Entry]

"The following products were found listed on Internet, but I haven't used them.

Online OCR

OCR Terminal

OCR Terminal is an online OCR service
that performs Optical Character
Recognition (OCR) on your scanned
images and pdf files and renders them
into editable and text searchable
documents.

Free OCR

Free-OCR.com is a free online OCR
(Optical Character Recognition) tool.
You can use this to perform OCR on any
image you supply.
This service is free, no registration
necessary. We also do not need your
email address.
Just upload your image files. Free-OCR
takes either a JPG, GIF, TIFF BMP or
PDF (only first page).
The only restriction is that the
images must not be larger than 2MB, no
wider or higher than 5000 pixels and
there is a limit of 10 image uploads
per hour.

Maestro Recognition Server is commercial, but has an online try-it demo.

Free software

FreeOCR - for images only.

FreeOCR is a scan & OCR program
including the Tesseract free ocr
engine also known as a Tesseract GUI.
It includes a Windows installer and It
is very simple to use and supports
multi-page tiff's, fax documents as
well as most image types including
compressed Tiff's which the Tesseract
engine on its own cannot read .It now
has Twain scanning.

pdfsandwich - pdf -> pdf convertor.

pdfsandwich is a command line tool for OCR scanned books or journals.
It is able to recognize the page layout even for multicolumn text.

Essentially, pdfsandwich is a wrapper script which calls the following binaries:
convert, cuneiform, gs, and hocr2pdf. It is known to run on Unix systems and has
been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems."
Guest [Entry]

"Install Imagemagick. Open a cmd window or terminal:

convert myfile.pdf myfile-%02d.jpg

The output will be 1 jpg file for each page in your pdf, myfile-00.jpg, myfile-01.jpg, etc.

Pass each image though an ocr program. I don't have much experience with this, but there seem to be alot of choices.

Convert each page of text back into pdf. You could do this again with imagemagick, but there are other ways as well:

convert page-%02d.txt -density 300x300 -compress jpeg final.pdf"
Guest [Entry]

"Your request seems to be a complicated solution to the problem, although I may not understand the problem correctly. At any rate:

Why not get a PDF writer that will allow you to enter the data directly on to the pdf page?"
Guest [Entry]

Try PDFCubed.com Nothing to install, it is all done online. You can send your documents to be processed via the web, email, or dropbox. Scaned PDFs and TIFs are converted into searchable text pdfs and then can be retreived via the web, email, or dropbox.