We’ve all received PDF files with content that we wanted to reuse. This means that most of us have been disappointed by the difficulty of getting rich content out of a PDF. For example, if you try to copy and paste table rows from a PDF viewer into Word, you frequently end up with a collapsed single line of text, as in the figure below. Most existing PDF viewers, in essence, limit people who use PDF’s to a “look but don’t touch” experience.
PDF Reflow, a new feature in the upcoming release of Word, changes the landscape by letting you convert PDFs into editable Word documents.
The goal of PDF Reflow is to convert PDF content into Word documents that contain the original layout intent and that flow correctly across pages when you edit or read. In other words, when you convert a PDF in Word, the elements in the document should act as if you created them in Word. A list from a PDF, for instance, will act just like any other list in Word: hit Enter at the end of a bulleted paragraph and a new bullet will be created.
The PDF Reflow feature is not intended as a replacement for a reader, such as Windows 8’s Reader, but rather is a converter that gives you a new level of access to your content. It works with any PDF, but because we re-layout the contents, the results are best with documents that are mostly textual, such as legal and business documents. If a PDF contains mostly images and diagrams, as in a presentation or a brochure, converting it has a much higher likelihood of issues like the one in the columns example above.
For example, take a look at the document below. Some of the text in the first column wraps to the next line differently and a line from the top of the original second column moved to the end of the first. All the original content is there, but because the PDF reflow process values the ability to edit the content over picture-perfect alignment, some of the content repositions.
That isn’t to say we won’t try our best to convert any PDF file you hand us! For instance, let’s take a look at a PDF of a PowerPoint slide. PDF Reflow converts the file and preserves all the content, however the text ends up in textboxes and won’t re-layout nicely across pages if you start typing in it.
Keep in mind that PDF Reflow creates a copy of your content during the conversion. If the results aren’t what you expect, your original PDF file still remains safely intact.
How it works
PDF is a fixed file format, which means the file stores where text images and graphics are placed on a page, but not necessarily the relationships among them. Most PDFs don’t have a notion of content structure elements, such as paragraphs, tables, or columns. In our table rows example, there’s not enough information in the PDF file for us to know that these words should be in separate table cells. Instead, all we can see is that the text should be right after each other.
You can see the table structure with its text on the surface of the document, but underneath, the PDF usually stores the table as an absolutely-positioned set of lines. (PDF uses the same type of lines to represent underline, strikethrough, or even graphs.) Sort of like this:
There is typically no indication in the PDF file that links text content with these lines or that these lines and text logically represent cells in a table.
When you open a PDF file in Word 2013, PDF Reflow constructs a Word 2013document from it, opening the door to easy editing and content reuse. It accomplishes this by using a system of complex rules to figure out what Word objects (like headings, lists, tables, etc.) would best represent the original PDF. The figure below shows what our table example looks like when PDF Reflow uses its heuristics to reconstruct the table structure and content from the lines and text.
PDF Reflow is built directly into Word 2013 so you can access your PDF like any other document. In the ribbon click FILE, and go to the Open tab in the Backstage. Navigate to the PDF location and select the file you would like to convert! Your content, formerly locked up in a PDF, is now yours to work with again.