Monday, May 31, 2004

Word Confidential

This is a pretty useful article....it's about what a Microsoft Word document is really keeping tabs on. Basically all the changes made by the last 10 authors.

Useful to know, especially if you are tempted to take a document template and modify it slightly for a new client. Your new client can get a lot more than you bargained for.

Woody's Office Watch contains some frightening data about all the metadat that travels with a typical Word document. The newsletter is free. It is advertising sponsored, but seems to take a strongly independent view from the vendors. Is not published as often as it used to be, but still worth signing up.

Here's an extract...


THE PERSONAL INFO HARVESTING SHTICK

Man, if Microsoft can't get it right, how can you? The folks in Redmond continue to post documents with all sorts of internal details on their Web site. While I haven't found any earth-shattering anti-trust-busting bits of "metadata", the stuff I have found leaves me wondering if anybody can get it right.

We're going to show you just how easy it is to publish Word documents with information you might not want others to see. We'll do that by taking examples from Microsoft itself. Having shown how even the supposed Word experts can get trapped, in future issues Woody's Watch (WOW and WOW-MM) we'll show you and Microsoft how to publish just the document and no more.

In WOW-MM 4.15, I talked about two documents with embarrassing embedded data. One contributed to the downfall of one of England's most influential politicians. The other exposed a Microsoft dirty trick.

A WOW-MM reader pointed me to an entire collection of documents posted by one state's Supreme Court. I didn't see anything particularly damning in the documents, but they're strewn with names and email addresses of clerks, law firms, and individuals; file locations, server names, and so on - a few hours' worth of harvesting could lead to a credible blueprint of sections of this Supreme Court's word processing system.

Worth noting: few (if any) US federal agencies - from all branches of government - post Word documents on the Web any more. Everything from the White House to the CIA to the US Supreme Court appears to be in PDF. Bravo.

AT&T researcher Simon Byers has a report on the hidden data problems facing the Word-using world today - all 400,000,000 of us. You can download it at here. One part of his conclusion really hits home:

"...typical behavior patterns of Word users and the default settings of the Word program leads to an uncomfortable state of affairs for Word users concerned about information security."

This isn't strictly a voyeuristic exercise. When you leave dribs and drabs of information floating around on the Web, there's no telling how it can be used. I would guess that a dedicated cretin with a fast Internet connection could come up with a working roadmap to parts of Microsoft's development and marketing networks, just by looking at the flotsam and jetsam buried in readily available documents - documents posted on Microsoft's own Web site.

To recap, if you use Word 97 or 2000, Word maintains a detailed log of who has edited the document, and where it was located when it was opened - and there's nothing you can do about it.

If you use Outlook 2002 (the version in Office XP), and you send a document by attaching it to an email message, Outlook brands the document with the email address, name, and a number that can be traced to the PC that was used to send the file (although you need access to the PC to nail it for sure). It also brands the document with the subject of the email message that carried the file.

If you explicitly tell Word 2002 to remove personally identifiable information (Tools | Options | Security, check the box marked Remove Personal Information From File Properties on Save, and uncheck the box marked Store Random Number to Improve Merge Accuracy), and you send the document with Outlook 2002, Outlook still sticks the number that can be traced to your PC inside the file. Woody talked about that number - the _AdHocReviewCycleID - in here .

I'm very happy to report that Outlook 2003 seems to be doing it right. Finally. Telling Word 2003 to remove personally identifiable information is sufficient, in a default installation of Outlook 2003, to keep any personal info from being "branded" onto a doc when it's sent attached to a message.

Microsoft's Knowledge Base talks about the kinds of data that can be squirreled away in Word documents, and gives some tips for removing that data (when it's possible). But the simple fact is that most people, most of the time, don't bother.

Word 97 discussion: http://woodyswatch.com/kb?223790

Word 2000 discussion: http://woodyswatch.com/kb?237361
Word 2002 (Office XP) discussion: http://woodyswatch.com/kb?290945

No comments:

ShareThis