I have a number of books that I’d like to scan, OCR and throw away for good. In general I despise paper books, as they’re hard to work with.
Originally, I wanted to scan Tesla Eltos catalogues but I managed to find some on Scribd and push their creator into reuploading them on uloz.to.
Plustek OpticBook 4800 seems interesting
I’ve made a stand out of aluminium pipes and some 3D printed connecting parts. Ideally, one would keep the text in roughly the same position, and quickly run through the whole book, two pages at a time.
Unfortunately, it’s almost impossible to get rid of curling without the help of something like hands, which need to be carefully positioned, or a transparent glass sheet that apparently requires antireflex coating for good results.
This method is quite slow, even though I suspect some of the slowness is due to accidental data transfer bottlenecks.
Larger books are also generally unfriendly to being opened like this. However, if it’s my intent to get rid of the books afterwards, I can just fully unbind them.
See example files in "Pictures/Book scanning", or make some new ones with better light conditions and hence higher contrast.
https://github.com/unpaper/unpaper might be a good start
https://github.com/scantailor/scantailor might be the best thing
https://blog.michael.franzl.name/2016/09/11/digitize-books-produce-searchable-pdfs-scanned-book/ for inspiration
Comments
Use e-mail, webchat, or the form below. I'll also pick up on new HN, Lobsters, and Reddit posts.