Friday, May 11, 2007

PPT Metadata

I received an email recently asking if I had any tools to extract metadata from PowerPoint presentations. Chapter 5 of my book includes the oledmp.pl Perl script, which grabs OLE information from Office files; this includes Word documents, Excel spreadsheets, and PowerPoint presentations. I've run some tests using this script, and pulled out things like revision number, created and last saved dates, author name, etc.

Pretty interesting stuff. There may be more...maybe based on interest and time, someone can look into this...

Here's an example of the oledmp.pl output from a PPT file (some of the info is masked to protect privacy):

C:\perl>oledmp.pl file.ppt
ListStreams
Stream : ♣DocumentSummaryInformation
Stream : Current User
Stream : ♣SummaryInformation
Stream : Pictures
Stream : PowerPoint Document

Trash Bin Size
BigBlocks 0
SystemSpace 876
SmallBlocks 0
FileEndSpace 1558

Summary Information
subject
lastauth Mary
lastprinted
appname Microsoft PowerPoint
created 09.06.2002, 19:51:48
lastsaved 14.09.2004, 19:08:39
revnum 32
Title Title
authress John Doe

Pictures
Current User
♣SummaryInformation
PowerPoint Document
♣DocumentSummaryInformation

So what does all this mean? Well, we see the various streams that are embedded in the document, and an example of what is extracted from the SummaryInformation stream. Some of this information can be seen by right-clicking on the file in Windows Explorer, choosing Properties, and then choosing the Summary Tab, and then clicking the Advanced button.

Simple modifications to the oledmp.pl script will let you extract the stream tables, as well, showing even more available information.

No comments: