Published Tuesday, February 27, 2007 7:22 AM by martin

Microsoft Office 2007 File Formats - Part 2

As I mentioned in the previous post, any new Office file (docx, xlsx, pptx, or their macro-enabled equivalents) contains a crucial element called [Content_Types].xml.  In this post I want to delve into the contents of this particular xml file.  Remember, for the full low-down on any of this you can read the detailed specs at http://openxmldeveloper.org.

Here's a simple example from a document of mine...

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">

  <Default Extension="jpeg"

          ContentType="image/jpeg" />

  <Default Extension="rels"

          ContentType="application/vnd.openxmlformats-package.relationships+xml" />

  <Default Extension="xml"

          ContentType="application/xml" />

  <Override PartName="/document.xml"

            ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml" />

</Types

As you can see, inside the <Types> element, we have some combination of <Default> and <Override> elements.  The Defaults are associated with file extensions, and they tell the consuming application that any item in the container with a particular file extension can be expected to contain a particular kind of content.  Content types are specified in the standard fashion, so expect to see image/jpeg and application/xml quite a bit.  In my example it sets up some default associations for jpeg images, relationship-definition files (important in the new file formats), and "plain" xml files.

That application/xml content type is interesting though, because my .docx container also contains a file called document.xml.  Although this file has the .xml extension, its content is of a very specific kind: WordProcessingML, and we need to indicate that to the consuming application (because Microsoft Word, for example, knows what to do with that).  We want to override the default content type for .xml files, and say that for this specific document.xml file, the content type is different.  That's what the <Override> element is doing for us.

You can find a list of the "built-in" content types, such as the one used in this example for WordProcessingML, in the MSDN article here.

 

Technorati tags: , ,