Editing PDFs: What You See is Not Always What You Get

Jan 19
2010

Amyuni Technologies Blog
Many have surely asked: “Why can’t editing text in a PDF document be as simple as it is in a Word file?” Actually, the PDF’s inherent simplicity is the reason editing text can be challenging. Because despite the PDF’s evolution, its core design and purpose, the PDF was and will continue to be, a final presentation document exchange format.

Text: To a Viewer, It’s Just Code

The reason behind the PDF’s text-editing elusiveness is the fact that words and sentences in a PDF document aren’t really text. They are collections of objects commonly referred to as text elements. A text element and its corresponding code instruct a PDF-viewing application which characters to draw and how to draw them at certain positions on a page. For example, figure 1 below, outlines how the two words “Hello World!” would appear to a PDF viewer before being displayed on screen:

Figure 1: Partial Code View of “Hello World!” (bolded for emphasis)
code8

Once a PDF viewer such as the Amyuni PDF Creator displays them on screen, they appear as two connected words in a small sentence (Figure 2):

Figure 2: Text Elements Displayed as Two Words
hello-world-no-boxes

However if we enable the PDF Creator’s border-viewing options, we can see how the two words are in fact four separate text elements, each enclosed within their own outlined borders (Figure 3).

Figure 3: Text Elements Contained Within Borders
hello-world-boxes

The borders are visual aids that PDF Creator can display to help users identify the positions of text elements on a page. We can reposition the text out of its normal “left to right” sentence flow (Figure 4), to demonstrate how it behaves more like separate elements.

Figure 4: Separation of Text Elements
hello-world-broken-up1

Trying to Keep the Eggs in the Same Basket

Borders identify each text element and helps users see how many characters are contained within each of them. A text element’s borders and contents are both modifiable, however this is not without introducing potential inconveniences such as character shifts, font changes, and altered sentence structures, etc. Changing a text element (such as inserting or removing characters) is really changing its structure and it should be done selectively.

Does this mean that the PDF’s design is to blame for our inability to easily edit its content? No. Its design is the PDF’s raison d’être—to prevent changing the document. However, because people will always need to edit PDFs, there must be a way for them to do so more easily. As we shall see in an upcoming article, there are several ways users can improve the way they edit PDFs. That includes approaching the PDF editing process with a different mindset and using an Amyuni PDF tool such as the PDF Creator.

Franc Gagnon is the technical copywriter for Amyuni Technologies
www.amyuni.com

Leave a Reply

You must be logged in to post a comment.