In a recent project we had a scenario where we had to find all the paragraphs in
a particular style eg. Heading 1, Heading 2, etc. to build a custom table of
contents.
My initial thought was to iterate over the document paragraphs looking for the
particular style we were interested in.
The code looked like this:
private
void
FindStyleUsingParagraphIteration(Word.Document doc,
string
style)
{
foreach
(Word.Paragraph p
in
doc.Paragraphs)
{
Word.Style s =
(Word.Style)p.Range.get_Style();
if
(s.NameLocal == style)
{
string
text = p.Range.Text;
int
page = (int)
p.Range.get_Information(Word.WdInformation.wdActiveEndPageNumber);
Single vert = (Single)
p.Range.get_Information(Word.WdInformation.wdVerticalPositionRelativeToPage);
Message +=
string.Format("
Style: {0}, text: {1}, page: {2}, vert: {3}",
s.NameLocal, p.Range.Text, page, vert.ToString());
}
}
}
This worked perfectly however it was a little slow so I
looked for a better approach.
My second attempt was to use Word's built in Find functionality to find
paragraphs in a particular style.
The code looked like this:
private
void
FindStyleUsingFind(Word.Document doc,
string
style)
{
object
oMissing = Type.Missing;
object oTrue =
true;
object
oStyle = style;
Word.Range r = doc.Content;
bool found =
true;
while
(found)
{
try
{
r.Find.ClearFormatting();
r.Find.set_Style(ref
oStyle);
found = r.Find.Execute(ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing,ref
oMissing,
ref
oTrue,
ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing,
ref
oMissing);
}
catch
{
found =
false;
}
if
(found)
{
string
text = r.Text;
int
page = (int)
r.get_Information(Word.WdInformation.wdActiveEndPageNumber);
Single vert = (Single)
r.get_Information(Word.WdInformation.wdVerticalPositionRelativeToPage);
Message +=
string.Format("
Style: {0}, text: {1}, page: {2}, vert: {3}",
oStyle.ToString(), r.Text, page, vert.ToString());
}
}
}
The performance of this approach was much better
however it resulted in the Word status bar flickering as it performed the Finds
so I started looking for a third option.
My third attempt was to use Word's built in XML support and write an xpath
against the WordML schema.
The code looks like this:
private void FindStyleUsingXPath(Word.Document doc, string style)
{
try
{
string schema =
"xmlns:w=\"http://schemas.microsoft.com/office/word/2003/wordml\"";
string xPath = string.Format("//w:p[descendant::w:pStyle[@w:val='{0}']]/w:r/w:t",
style);
Word.XMLNodes nodes
= doc.SelectNodes(xPath,schema,false);
if (nodes != null)
{
foreach (Word.XMLNode node in nodes)
{
int page = (int)
node.Range.get_Information(Word.WdInformation.wdActiveEndPageNumber);
Single vPos = (Single)
node.Range.get_Information(Word.WdInformation.wdVerticalPositionRelativeToPage);
string text =
node.Text;
Message += string.Format("
Page (0), vert (1), (2)\n", page, vPos, text);
}
}
}
catch(Exception ex)
{
Message +=
ex.Message;
}
}
Sadly this is where it all went wrong. The code ran
fine (without any exceptions being thrown) however it returned no
results!
The first thing I noted was that it took longer to return no results
than my Find attempt took to do the job properly so this isn't a good approach
in terms of performance anyway however it was strange that it didn't return
any results. After some investigate I found that the wordml namespace was
not registered in the doc.XMLSchemaReferences collection so I thought that maybe
I should add it manually using the following code:
object oNamespaceURI
= "http://schemas.microsoft.com/office/word/2003/wordml";
object oAlias =
"w";
object oFileLocation
= @"C:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office
2003 XML Reference Schemas\WordprocessingML Schemas\w10.xsd";
string schema = string.Format("xmlns:{0}=\"{1}\"",oNamespaceURI,
oAlias);
doc.XMLSchemaReferences.Add(ref oNamespaceURI, ref oAlias, ref oFileLocation, false);
Unfortunately this code generates the following exception "This schema
cannot be used because it attempts to declare a namespace reserved by Word." so
that isn't the answer.
Finally in frustration I emailed a buddy of mine on the Office team who confirmed
definatively that SelectNodes(...) would only work custom schemas and
not WordML. So there you have it, don't waste your time like me trying to do
this until at least the next version. (I have submitted a feature
request).
For those of you who are interested in the relative performance here are the stats for my test
document:
Using Paragraph
iteration: 10.1445872 seconds
Using Find:
1.9928656 seconds
Using XPath: 4.9771568
seconds (failed)
My conclusion was to use the Find.Execute method and put up with the status
bar flickering!
The source for this experimentation can be downloaded from the following page
on the Sentient website:
http://www.sentient.co.uk/wordFindStyle.aspx
If you know of a better way to do this please add a comment to this blog.
Posted
Dec 07 2004, 09:13 AM
by
jonathangreensted