Book scanning

I have a number of books that I’d like to scan, OCR and throw away for good. In general I despise paper books, as they’re hard to work with.

Originally, I wanted to scan Tesla Eltos catalogues but I managed to find some on Scribd and push their creator into reuploading them on uloz.to.

TODO
Approach 1: Photographing
Approach 2: Scanner
Future: Normalizing the Scans

TODO

Plustek OpticBook 4800 seems interesting
Try leaving books open at an angle!

Approach 1: Photographing

I’ve made a stand out of aluminium pipes and some 3D printed connecting parts. Ideally, one would keep the text in roughly the same position, and quickly run through the whole book, two pages at a time.

Unfortunately, it’s almost impossible to get rid of curling without the help of something like hands, which need to be carefully positioned, or a transparent glass sheet that apparently requires antireflex coating for good results.

Approach 2: Scanner

This method is quite slow, even though I suspect some of the slowness is due to accidental data transfer bottlenecks.

Larger books are also generally unfriendly to being opened like this. However, if it’s my intent to get rid of the books afterwards, I can just fully unbind them.

Future: Normalizing the Scans

See example files in "Pictures/Book scanning", or make some new ones with better light conditions and hence higher contrast.

https://github.com/unpaper/unpaper might be a good start
http://www.tobias-elze.de/pdfsandwich/ maybe
https://github.com/scantailor/scantailor might be the best thing
https://blog.michael.franzl.name/2016/09/11/digitize-books-produce-searchable-pdfs-scanned-book/ for inspiration

Comments

Use e-mail, webchat, or the form below. I'll also pick up on new HN, Lobsters, and Reddit posts.