Skip redundant pieces

Reading Microsoft Word XML Files With SAS


 

This is a SUGI 31 (2006) paper.

Abstract:

In 2005 Microsoft announced that the new default format for documents created in Microsoft Office will be XML-based. The ability of SAS to read XML offers a convenient method for extracting structured information from Microsoft Word documents. This paper examines three scenarios where information from a Word document is read into SAS datasets: extracting text along with associated properties (styles and attributes), extracting all data from tables, and extracting coordinates of objects in drawings.

 

 

The following files are available

 

 

 

Feel free to use the code. I don't guarantee the absence of bugs.

 

Larry Hoyle




Institute for Policy & Social Research
1541 Lilac Lane
607 Blake Hall
Lawrence, KS 66045-3129
Parking/Directions
Email: ipsr@ku.edu
Phone: (785) 864-3701
Copyright © 2009 by the University of Kansas