May 30

25 years after PDF 1.0, PDF 2.0 is right around the corner. The final specifications are expected to be approved by the ISO committee any time now. This gave me a perfect opportunity to attend PDF Days Europe and make sure we are not missing out on anything important for the future of PDF.

A couple of important clarifications to start with:

– I had initially thought of PDF 2.0 and the future of PDF as being the same thing. This is absolutely not the case. What is referred to as the future of PDF are a number of new features that experts in the field are working on, and which will be based on the PDF 2.0 specifications. PDF 2.0 is the successor of PDF 1.7 and is being adopted right now as the basis of future development.

– A second thing is that there were initially some discussions about PDF 2.0 being based on XML which would be a drastic change from the PDF 1.x format, however, this is not the case. There is no XML in the PDF 2.0 format other than the metadata streams which have always been in XML.

The Promises of PDF 2.0

PDF 1.0 saw a number of incremental changes until PDF 1.7 which became an ISO standard and stabilized the format for a few years. PDF 1.7 contained however a number of elements that were dragged from the earlier versions, including some imprecisions in the specifications or things that were only there to satisfy earlier versions of PDF readers. The people who worked on PDF 2.0, which were not only from Adobe but a wide range of industry experts, did a great job at clarifying the PDF format, removing (deprecating) a number of items that were never adopted by the industry, and adding new features that are going to be the base for the future of PDF. Without going into the technical details of all these changes, here are the items that struck my attention as being most significant:

– PDF 2.0 is a lot stricter in terms of PDF writers conforming to the specifications. For example, PDF 2.0 makes it mandatory that PDF files start with %PDF and end with %%EOF. This might seem trivial, but many PDF producers create documents that are not properly formatted, but because Acrobat Reader accepts these documents, there are considered as valid (see my previous article http://blog.amyuni.com/?p=1627.) PDF processors had to write a number of heuristics to fix badly formatted documents, this should no longer be the case with PDF 2.0.

– PDF 2.0 improves on the security features of PDF by deprecating older security algorithms for both document encryption and digital signatures. Users of PDF documents should still be warned that documents with an owner password and a blank user password are still as good as unprotected documents. The misconception that a document that is not digitally signed and protected by just the owner password is secure remains in PDF 2.0.

– Provisions for packaging multiple documents in a single PDF have improved with the introduction of the concept of Document Parts without the use of XML or ZIP package as many other document formats do.

– Tagging of PDF documents or creating documents with structured contents has seen a big overhaul and is better prepared for the future features of the PDF format.

– Many other improvements such as Geospatial and 3D features are worth mentioning but cannot be part of this short introduction to the PDF 2.0 format.

Deprecated Items in PDF 2.0

PDF 2.0 deprecates some PDF 1.x features that were rarely adopted by PDF writers and processors. Deprecation means that PDF 2.0 producers should not use these features and PDF 2.0 consumers should ignore them. Many of the deprecated features are minor such as movies and sound annotations that are replaced by a new multi-media annotation or the Info dictionary removed as it is redundant with the XMP metadata stream. Other redundant items or elements that are not needed for the rendering or processing of a PDF have been removed.

XFA Format Removed with no Alternative

One of the major features of PDF 1.x that was deprecated in PDF 2.0 are the XFA specifications. XFA is a quite complicated scheme for creating PDF forms. Its major drawbacks were its complexity and the fact that it mixes together data entry, data presentation and data storage all in the same file. Many businesses have invested a lot of resources into XFA and will be disappointed to see it disappear. PDF 2.0 does not suggest any common alternative, various PDF businesses presenting their own solutions, some of them using HTML5 mixed with PDF, but there doesn’t seem to be a consensus on the direction to take.

A Few Regrettable Gaps

– A common Document Rights Management (DRM) platform is still missing from PDF with various suppliers having their own implementations. I heard from some speakers that there is a push for PDF to replace ePub (which has now merged with W3C.) DRM is an essential part of ePub and without a common platform for DRM in PDF, it is difficult to imagine how PDF can replace ePub unless this is something in the works for the future of PDF.

– The continual absence of a common system for validating if a PDF conforms to the specifications or not. One of the strengths of XML is that one can immediately determine if the file follows the schema (or the specifications) but there is no such thing in PDF. There is an initiative called veraPDF which is an open-source platform for validating if a PDF conforms to PDF/A but it is not clear if the same open-source solution will apply to validating any PDF.

– Font and text handling haven’t see any significant improvement or clarification. PDF producers can still be very creative with the interpretation of the specifications and their files would still be considered valid. For example, when the specifications require that certain types of fonts be embedded, PDF readers would still accept the document even if the fonts are not embedded, which can create differences in the rendering of the document across platforms.

The Amyuni Tech. Roadmap for PDF 2.0

PDF documents produced by our tools are pretty much 2.0 ready. A few minor adjustments will be made in v6 of our libraries to remove deprecated items when 2.0 compatibility is selected. Support for the new encryption and digital signature methods will require a good deal of efforts but has already been initiated and will be added to v6. Improvements to tagged or structured PDFs mainly for the generation of PDF/UA documents are already in the works and will be revisited to make sure they are all 2.0 compatible.