Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> To anyone who is going to suggest Calibre, please don't, lest you make yourself look like you know nothing about the issues I'm talking about.

Can you please elaborate on this. I've never tried, but I thought Calibre could strip the DRM from kindle ebooks. Or do you mean that Calibre can't handle the general case of arbitrary DRM (via OCR methods)?



First, Calibre doesn't do OCR (at least not the version that I last looked at, maybe a year or so ago). But then still, even if it would, having a bunch of OCR'ed pages doesn't make an ebook. Any eventual conversion software would have to strip page header and footer material, recognize chapter titles and paragraphs, etc.; and for technical books it's a magnitude worse still - it would have to recognize sidebars, images, multi-level chapter headings, ...

Converting a pdf into an ebook (by ebook I mean: epub or similar formats, I recognize that that is a fairly narrow definition of ebook; some people call scanned jpg's an 'ebook'), even without accounting for DRM or OCR, is Very Hard. PDF is only page layout, it doesn't have any semantics; ebooks are build around semantic markup (otherwise you can't make an automatic ToC, or reflow pages when the user increases font size, for example).

My point is: there is no software (to the best of my knowledge, and I've looked) that can convert bitmaps into their equivalent 'ebooks'. So, it's not quite as easy (actually it's so much harder so as to approach impossible, save from manual conversion) as the GP would make it seem.


Thanks for the clarification. I agree that converting a BMP to a properly formatted eBook is next to impossible without at least some manual intervention (or using an expert system based on manually constructed book templates).

I do wonder how far Google has got with their internal software - and if they care at all about formatting, or just the text.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: