Generating Microsoft Word Documents Natively using BizTalk 2006

In this post I’ll discuss how to generate Word 2007 documents natively from BizTalk 2006 using the Office Open Xml System.IO.Packaging API recently released by the Microsoft Office Team under .Net 3.0.

Background

Unless you’ve lived under a rock during the last year, you’ll know that the Office Open XML (OOXML) format is the new Xml format for the Office 2007 suite, namely Word, Excel and Powerpoint. OOXML uses a file package conforming to the Open Packaging Convention and contains a number of individual files that form the basis of the document; the package is then zipped to reduce the overall size of the resulting file (either a .docx, .xlsx or .pptx).

Generating Word Documents – Overview

Generating a Word document is relatively simple and only requires a custom send pipeline component that generates our OOXML package.

In this post I will be using a Sales Report scenario, generating a Word document from the output of a fictional ERP system; to that extent, I’ll also be mapping from a fictional sales summary Xml message to the required OOXML format before generating the final .docx. The final document will look something like the following (note that the areas in red will be replaced with content from our ERP sales summary message – click on the image for a larger version):

Proposed Sales Summary Document - SmallBefore we start, I need to present a quick crash-course in the structure of OOXML packages. A minimal OOXML WordprocessingML document contains three parts: a part that defines the main document body, usually called document.xml; a part detailing the Content Types (which indicates to the consumer what type of content can be expected in the package); and a Relationships part (which ties the document parts and Content Types together). When using the System.IO.Packaging API we only need to concern ourselves with the main document body – the API takes care of creating the Content Types and Relationship parts. Its this feature of the API that allows us to create Word documents in BizTalk – all we need to do is create the Xml for the main document and squirt it at a custom pipeline component which does the packaging stuff for us using the API.

Note that the structure of an OOXML document is outside of the scope of this post (but a good understanding is fundamental when working with these documents) and I would recommend that you read the excellent Open Xml Markup Explained by Wouter van Vugt.

Generating Word Documents – The ‘Main’ Document

The main document body (i.e. document.xml) is the only part that is generated in the BizTalk solution. We don’t actually create a file called document.xml – the packaging API does this for us – instead we simply create a message that conforms to the OOXML schema and pass this into the custom Send pipeline.

In our scenario, we are generating a Sales Report document for distribution to the finance department – we will receive an Xml sales summary document from our fictional ERP system that resembles the following:

<?xml version="1.0" encoding="utf-8"?>
<ns0:SalesReport xmlns:ns0="http://schemas.modhul.com/erp/salesreport-1.0">
    <Author>Nick Heppleston</Author>
    <Email>nick@modhul.com</Email>
    <SalesStart>10th January 2008</SalesStart>
    <SalesEnd>17th January 2008</SalesEnd>
    <SalesSummary>100,48.00</SalesSummary>
</ns0:SalesReport>

which needs to be mapped into our OOXML main document body message (I think the layout of the OOXML message is pretty self explanatory, however I would point you at Open Xml Markup Explained if you’re after a more detailed explanation):

<?xml version="1.0″ encoding="utf-8″ ?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:body>
        <w:p>
            <w:r>
                <w:rPr>
                    <w:b />
                    <w:sz w:val="52/>"
                        <w:rFonts w:ascii="Cambria" />
                    </w:rPr>
                <w:t xml:space="preserve">Sales Summary for: </w:t>
                <w:t>Nick Heppleston</w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:r>
                <w:rPr>
                    <w:i />
                    <w:sz w:val="52/>"
                        <w:rFonts w:ascii="Cambria" />
                        <w:spacing w:val="15/>"
                        <w:color w:val="48FDB2/>"
                </w:rPr>
                <w:t xml:space="preserve">Sales from: </w:t>
                <w:t>10th January 2008</w:t>
                <w:t xml:space="preserve"> to </w:t>
                <w:t>17th January 2008</w:t>
                <w:t xml:space="preserve"> - </w:t>
                <w:t>£100,48.00</w:t>
            </w:r>
        </w:p>
        <w:p>
            <w:r>
                <w:t xml:space="preserve">Contact: </w:t>
                <w:t>Nick Heppleston</w:t>
                <w:t xml:space="preserve"> | </w:t>
                <w:t>nick@modhul.com</w:t>
            </w:r>
        </w:p>
    </w:body>
</w:document>

This transformation can be performed anywhere: in the sample solution I’ve put the map on the Receive Port. Also, because I can’t think of any way to generate this type of message using a standard BizTalk Map – how do I graphically say ‘map from this source node to this destination node’ when all of the destination nodes simply repeat themselves – I am using custom XSLT to drive the map.

Note: I’ve yet to find a satisfactory XSD for the WordprocessingML markup so the solution contains a OOXML schema that was automagically generated from the above destination format. I’m working on sourcing the schema – I have a number of ‘feelers’ out with the Office Team and I hope to be able to provide a reference in the next couple of days.

With our Sales Summary message now mapped and in the necessary OOXML format, we can send it to the custom pipeline / pipeline component for it to do its work and generate our .docx package.

Generating Word Documents – The Custom Pipeline Component

The custom pipeline component is relatively simple. It uses the System.IO.Packaging API introduced in .Net 3.0 which can be found in windowsbase.dll (C:Program FilesReference AssembliesMicrosoftFrameworkv3.0windowsbase.dll); full documentation regarding this namespace can be found online at MSDN. The API is invoked in the pipeline component Execute() method as follows:

   1:  public IBaseMessage Execute(IPipelineContext pc, IBaseMessage inmsg)
   2:  {
   3:      XmlDocument InputXmlDocument = new XmlDocument();
   4:      InputXmlDocument.XmlResolver = null;
   5:  
   6:      // Define bodypart instances
   7:      IBaseMessagePart bodyPart = inmsg.BodyPart;
   8:  
   9:      // Define stream instances
  10:      Stream originalStream = null;
  11:      MemoryStream odfStream = new MemoryStream();
  12:  
  13:      string docContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml";
  14:      string docRelationshipType = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument";
  15:  
  16:      if (null != bodyPart)
  17:      {
  18:          // Get a *copy* of the original stream
  19:          originalStream = bodyPart.Data;
  20:  
  21:          // Check that the original stream is not null
  22:          if (null != originalStream)
  23:          {
  24:              // Load the original message stream into our input xml document 
  25:              // to be used as the basis of the OOXML document.
  26:              InputXmlDocument.Load(originalStream);
  27:  
  28:              try
  29:              {
  30:                  // Create a new OOXML package
  31:                  Package pkg = Package.Open(odfStream, FileMode.Create, FileAccess.ReadWrite);
  32:  
  33:                  // Create a Uri for the document part
  34:                  Uri docPartUri = new Uri("/word/document.xml", UriKind.Relative);
  35:  
  36:                  // Create the document part
  37:                  PackagePart mainPart = pkg.CreatePart(docPartUri, docContentType);
  38:  
  39:                  // Add the data from the Xml Document to the document part
  40:                  Stream partStream = mainPart.GetStream(FileMode.Create, FileAccess.Write);
  41:                  InputXmlDocument.Save(partStream);
  42:                  partStream.Close();
  43:                  pkg.Flush();
  44:  
  45:                  // Create the relationship between the part and the package.
  46:                  PackageRelationship pkgRelationship = pkg.CreateRelationship(docPartUri, TargetMode.Internal, docRelationshipType, "rId1");
  47:  
  48:                  // Flush the changes then close the package
  49:                  pkg.Flush();
  50:                  pkg.Close();
  51:              }
  52:              catch (Exception Ex)
  53:              {
  54:                  EventLog.WriteEntry("BizTalk 2006 - Build ODF Package", "Error encountered building the package: " + Ex.Message, EventLogEntryType.Error);
  55:              }
  56:  
  57:              try
  58:              {
  59:                  // Rewind the new OOXML stream
  60:                  odfStream.Seek(0, System.IO.SeekOrigin.Begin);
  61:              }
  62:              catch (Exception Ex)
  63:              {
  64:                  EventLog.WriteEntry("BizTalk 2006 - Build ODF Package", "Error encountered rewinding the stream: " + Ex.Message, EventLogEntryType.Error);
  65:              }
  66:              finally
  67:              {
  68:                  // Add the new OOXML stream into the return message.
  69:                  bodyPart.Data = odfStream;
  70:                  pc.ResourceTracker.AddResource(odfStream);
  71:              }
  72:          }
  73:      }
  74:  
  75:      return inmsg;
  76:  }

A quick overview of the code is as follows:

  • Line 26: We load a copy of the original message data part stream into an XmlDocument to use as the main document body (the document.xml) when building the package.
  • Line 31: Create a new OOXML package in a new MemoryStream.
  • Line 34: Create a URI to the main document body (calling it document.xml).
  • Line 37: Create the main document body part (docPartUri and docContentType.
  • Lines 40 – 43: Save the contents of our BizTalk message to the main document body part (the message we created in the BizTalk map).
  • Line 46: Create a package relationship for the main document body part.
  • Line 60 & 69 – 70: Rewind the the MemoryStream and overwrite the original message with our new OOXML package.
  • Line 75: We return the message containing the OOXML package.

The final message is sent via the FILE adapter and written to the file system. The end result looks like this (click on the image for a larger version):

Finished Sales Summary Document - SmallThe complete solution – containing the pipeline component and a BizTalk proof of concept project – is available to download and can be found archived in the downloads area of this blog. Grab a copy, try it out for yourself; comments and suggestions are welcome.

Conclusion

In this post I hope I’ve shown you the tools necessary to generate Word 2007 documents natively using BizTalk 2006. The example I presented is extremely simple and does not include styles, themes, images, headers and footers, font tables etc. that would exist in a real-life document, but I hope it has presented a starting-point for your own custom development.

These same techniques can also be applied to create Excel spreadsheets or PowerPoint presentations – in fact, while writing this post I have had a number of ideas for enhancements to the pipeline component and will endeavour to create a CodePlex project if I can find the time.

Disclaimer

This work is licensed under a Creative Commons Attribution 2.5 License – you can use commercially and modify as necessary, but you must give the original author credit. Furthermore, sample projects and code are provided “AS IS” with no warranty.Click the image below to view further detail of the licence.

Creative Commons License

Advertisements

7 thoughts on “Generating Microsoft Word Documents Natively using BizTalk 2006

  1. interesting post ,
    I am thinking in the reverse scenario, I have documents in a shared folder or send by email and I need to extract the data from it and then pass to an Orchstration !

    any advice

  2. Hi Essam,
    This should be relatively easy in a disassembling receive pipeline component: Receive your .docx or .xlsx via either the POP3 or FILE adapter and decompose the package using the System.IO.Packaging API. Once you have extracted the relevant Xml part of your document simply drop it out of the component and map either on either the receive port or in an orchestration. Should be simple (-ish!)

    I do plan on developing a proof-of-concept to demonstrate decomposing a .docx file, however I’m sans-laptop at the moment and can’t do any development outside of work – gaaaah!!

    Nick.

  3. Thanks Nick, this is a lifesaver!

    Any thoughts on how more advance formatting like page numbers, headers etc can be achieved.

    Richard

    • Hi Richard,
      Thanks for the comment; off the top of my head I’m not sure how to achieve page numbers and headers, however this will be OOXML detail that will need to be built using the XSLT transformation.

      One easy way to determine the OOXML that you need is to create a sample Word 2007 document, extract the body part from the zip file and inspect the Xml that is generated. You can easily then use that as the basis of your transformation.

      Cheers, Nick.

  4. Hi,

    I have a requirement to generate word document from an input xml. So I unzipped the open xml format and created a schema for the document.xml. then I have a map that generates this xml. when I tried to test this map. I got this error

    error btm1046: Output validation error: Prefix ‘ns5’ cannot be mapped to namespace name reserved for “xml” or “xmlns”. Line 1, position 517.

    and this was the xml root(pasting only the root node part) that generated though the error came up

    and this is the original document.xml root

    =================

    we can see that ns5 has been assigned for http://www.w3.org/XML/1998/namespace

    and in the original document xml namespace is used for sample data

    and the generated xml has nodes like

    sample data

    ===============

    actually the 2 differences are the root node should not contain xmlns:ns5=”http://www.w3.org/XML/1998/namespace” and “ns5:space” attribute should actually be “xml:space”

    How do I generate this in a biztalk map. what configuration change to be made in the schema or anywhere else

    any ideas ? PLEASE HELP

    Thanks in advance

    surya

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s