I stumbled across a library that I didn’t realize was out there, for working with Microsoft Office documents. It only works with the newer Office 2007 and 2010 files (and 2003 with a compatibility pack), i.e. those using the Open XML standard.
Download and install the Open XML SDK 2.0 for Microsoft Office: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=5124.
I got the second, smaller file:
Add a reference to DocumentFormat.OpenXml and WindowsBase to your project:
You can now programmatically access Office documents, without the need to have Microsoft Office installed or any Interop DLLs. This works with Word, Excel and PowerPoint (using the OpenXML WordprocessingML, SpreadsheetML and PresentationML).
For example, to get the custom properties of a Word document:
public static Dictionary<string, string> GetCustomPropertiesOfWordDocument (string filename) { using (var package = WordprocessingDocument.Open(filename, false)) { var properties = new Dictionary<string, string>(); foreach (var property in package.CustomFilePropertiesPart .Properties.Elements<CustomDocumentProperty>()) { properties.Add(property.Name, property.VTLPWSTR.Text); } return properties; } }
In this example, you would need to add some error handling around line 9 (property.VTLPWSTR.Text) as your property may be a different type or null.
When deploying your solution, you need to include DocumentFormat.OpenXml.dll with your application. You can set Copy Local to True in the properties for the reference, and it will be copied to your output folder when you compile the project:
Download a sample project here.