Home » Questions » Computers [ Ask a new question ]

Extracting text from a .PDF scanned book [closed]

Extracting text from a .PDF scanned book [closed]

I have a scanned a book in PDF format, but the quality is rather poor:

Asked by: Guest | Views: 168
Total answers/comments: 4
Guest [Entry]

I bought the book !
Guest [Entry]

"ABBYY Fine Reader is very strong OCR software. It deals with very complex layouts and supports a lot of formats (including pdf).
Romanian is supported with dictionary, i.e. software uses dictionary for hypothesis prioritizing during recognition. (here).

In any case, OCR-ing scientific literature, with has poor scan quality is difficult task. Be ready to spend a lot of time to help software with results check and layot fixes. On your scan I see a lot of very poor-quality text :(. I don't think any OCR software could work normally with it."
Guest [Entry]

"Well, for text recognitions one usually searches for OCR (optical character recognition) programs. There is a variety of them around, so a simple google search will do more good than me here.

I didn't understand the last part ""recognize Romanian"" - you mean it has to recognize the Romanian language, or to be localized (translated) to Romanian ? In case of the first, I believe there will be no problem; if the second is the case, then I'm not so sure.

Also, if it is not a book by your local countrymen, then there is a chance it is already translated in english ... so if you have it in pdf in romanian, try searching for an english version ... then only problem is that's you know ... illegal (sometimes one doesn't have a choise though)."
Guest [Entry]

Try PDFCubed.com. It's an online OCR service that makes creating a searchable text PDF easy. Scanned documents can be submitted via the web, email, or dropbox.