AbstractA Swiss IT company specialized in both document analysis/understanding and human-machine interfaces, in collaboration with another IT Swiss company, has developed an interactive processing platform for transforming PDF documents in reflowable (i.e. to automatically adapt its layout to a category of devices: desktop, mobile, etc.) and augmented eContent. The company is looking for publishers and editors interested in integrating the technology for the production of dynamic eBooks.
DetailsThe aim of this offer is to sweeten the life of final users and publishers by proposing ergonomic interactive technologies for restructuring, tagging and augmenting digital information. These technological solutions are the result of the unusual wedding of two distinct domains: document analysis and human-machine interaction!
The Swiss IT company has developed and offers a platform for transforming PDF files into reflowable EPUB (electronic publication) documents. Reflowing a document means to automatically adapt its layout to each category of devices (desktop, mobile, and so on) in order to offer users the most suitable ergonomics for an interactive reading experience. The platform also allows editors and publishers to enrich EPUB documents with multimedia information such as video and audio sources:
Compared to existing solutions for eContent production, the platform offers:
- Analysis and restructuration of content for documents even when not well structured
- Interpretation and tagging of content, which can be also used for indexing and other personal purposes
- Production of ebooks in an open format (EPUB 3) accessible within any device, instead of proprietary formats or expensive applications.
- The full control over the transform and publication process (opportunity to act at different levels and independence from external conversion companies)
- The integration in the current standard publishing process, without forcing people to change their competences and uses at work.
- A production process in average 5-15 times faster than existing solutions.
The platform is composed of 4 different technologies, which are addressing the following phases:
1) Automatically cleaning the PDF document and extracting it into an XML file.
2) Semi-automatic tagging of the XML document.
3) Transforming the document into an eBook document as EPUB. The resulting EPUB file is created by mixing the XML document and bespoke CSS-Cascading stylesheets and HTML templates.
4) Enhancing the EPUB document with multimedia or interactive data.
Phases 1 and 2 are especially conceived for integrating the platform into every existing publishing chain, but editors already having their tagged document formats (XML or other) do not require them.
- PDF to XML extraction and recovering of words, lines, paragraphs and reading order
- Semi-automatic tagging
- EPUB layout calculated automatically from CSS and HTML templates
- Extraction of tables, vector graphics, and images
- Table of content creation
- Audio embedding
- Video embedding
- Opportunity to configure independently each transforming stage
- Java 7 compatible
- Windows, Linux and Mac compatible