Uncompress Documents

Uncompress

Uncompressing Documents

Introduction

This is an extra to the main article on document internals. Most readers will be able to skip large parts of this, but full details are provided for all who want them.

Office Open XML Files consist of several parts, which must, somehow, be packaged as a complete entity. Part 2 of the OOXML standard is dedicated to how this packaging should be done, and it specifies two distinct aspects of the packaging. Firstly there is an abstract package model that specifies what the parts of the package are, and how they must be linked together through their contents, and, secondly, there is a physical mapping that specifies how the abstract parts are to be mapped to a physical ZIP archive. It is not a requirement of the standard that ZIP archives be used but they are the only physical structure defined and the only one that Word supports, so, in practice, all Word Documents must be held as ZIP archives.

ZIP archives are structured according to a long-established standardstandard [link to the Zip standard at http://www.pkware.com/documents/casestudies/APPNOTE.TXT], and there are several applications available that facilitate working with them. One popular, free, package is 7-Zip7-Zip [link to the 7-Zip Home Page at http://www.7-zip.org/]; others, such as WinRARWinRAR [link to the WinRAR Home Page at http://www.rarlab.com/] and WinZipWinZip [link to the WinZip Home Page at http://www.winzip.com/], are available for a price. For a long time, now, the Windows UI, itself, has had a built-in ability to compress and uncompress ZIP archives. Although I would recommend almost any other utility over the somewhat limited built-in one, it is the built-in one that is available to everybody without having to install third party software, and it is what I shall use to demonstrate.

Windows Explorer

Windows Explorer is the front line of the Microsoft User Interface; it is the what displays your desktop, and presents your view of, for example, My Computer; without it, Windows barely functions and, should it fail, Windows will automatically restart it.

Windows Explorer decides what to do with a file that is presented to it, based on the extension of the filename, the, traditionally, three, although now sometimes four, characters after the final dot in that name. If a file has an extension of .docx, Windows Explorer will, usually, pass it to Word: this is the mechanism by which double clicking the icon of a Word document causes the document to be opened. If a file has an extension of .zip, Windows Explorer will work its own magic to present a view of the contents of the archive as though it were an uncompressed folder. The process by which this happens is buried deep in Windows Explorer and is not open to scrutiny, or to customization.

So how do you persuade Windows Explorer to open a Word Document, with an extension of .docx, as though it is a ZIP archive with a .zip extension? The obvious, and, frankly, clumsy, answer is that you give it a .zip extension. Although this is a simple operation, full details of how to do this are presented below.

Seeing Filename Extensions

To change the extension of a file, you must first have extensions showing and, by default, they are set not to show. Open any Windows Explorer Window, My Computer, for example, and somewhere you should be able to find a way to open the Folder Options dialog. I don’t say it that way to be difficult, it’s just that there are so many possibilities, and I am not going to try to cover all of them. On Windows XP, you have a toolbar, and you select Tools > Folder Options from it. On Windows Vista, and Windows 7, you have a bar, the name of which I do not know, with a variety of things on it, one of which should be Organize, from which you can choose Folder and Search Options, or, alternatively, you can get the toolbar by pressing Alt and then doing as per Windows XP. However you do it, you should get this dialog:

The <i>Folder Options</i> Dialog
The Folder Options Dialog

Select the View tab, and make sure the option to hide extensions for known file types is unchecked, then press OK.

If you are running Windows 8, this is the first thing that I have found that is easier to do than it is in Windows 7. Windows Explorer in Windows 8 has a Ribbon, just as big and ugly as the one in Office, and the Ribbon has a View tab, and, on it, is a checkbox, File name extensions, which you need to check.

The Ribbon in Windows Explorer on Windows 8
The Ribbon in Windows Explorer on Windows 8

Changing Filename Extensions

When you can see filename extensions, you can change them! To do this, select the document and right-click on the mouse, or press the menu key on the keyboard, to get the context menu:

The Context Menu for a Word Document
The Context Menu for a Word Document

Your context menu will not be the same as mine, but it should still have the Rename option, the one you want. As a shortcut, to bypass displaying the menu, you can just press F2. However you get there, this, or something like this (depending on how you have files displayed) is what you will see next:

About to rename your file
About to rename your file

The part of the file name before the extension will be highlighted, as this is the part of the name you would normally expect to change. In this instance, however, you want to change the extension, and the best way to do this is to add a new extension at the end. If you choose to change the extension, rather than adding a new one, you may have a problem when you come to rename it back. It is easy to forget whether the file is a document (a .docx file), a macro-enabled document (a .docm file), or any of the other file types, an Excel workbook, perhaps, that you may have renamed in order to work with the contents.

Your File almost renamed
Your File almost renamed

Just move the cursor to the very end of the filename, add “.zip”, and press Enter. Windows will immediately warn you that what you’re doing could cause a problem, but, as you know better, you can just press Enter again to dismiss the prompt.

Windows asking if you’re sure
Windows asking if you’re sure

Having renamed the file, Windows Explorer will now treat it as a ZIP archive instead of as a Word Document. You can double click on the icon to view the files inside, or you can uncompress the file to create a folder structure. If you right-click, as you did before you renamed the file, you will see a different menu:

The Context Menu for a Zip Archive
The Context Menu for a Zip Archive

Now you see how easily fooled Windows Explorer is. The file is exactly the same file as it was but, because the filename has a different extension, Windows Explorer thinks it is a different type of file and offers up a different set of options. The options that are available for what it now thinks of as a compressed file are different from those it offered when it thought the file was a Word document, in particular there is now an option to Extract All. If you choose this, you will be prompted thus:

All Systems Go!
All Systems Go!

The default folder into which your ZIP archive will be uncompressed is named after the archive name without the “.zip” extension, in other words, the name of your original document. You can let it default, or choose your own folder; either way, you will have the contents at your disposal.

That’s all I really want to say on the mechanics of uncompressing. It’s time to get back to the the interesting stuffthe interesting stuff [link to the main article at OOXML.php#Uncompress]