The BizTalk Ops Team – Maintaining a Healthy, Responsive and Available BizTalk Environment

Originally posted by Nick Heppleston at: http://www.modhul.com/2008/12/22/the-biztalk-ops-team-maintaining-a-healthy-responsive-and-available-biztalk-environment/

One of the things that surprises me about BizTalk installations is, in my experience, the limited support they receive once a project has gone live. BizTalk is a large enterprise product and a dedicated team of BizTalk operational specialists and SQL Server DBA’s should be created for the task of maintaining operational and test environments.

In this blog-post, I’ll run over some of the responsibilities that I believe a BizTalk operational support team need to focus on to maintain a healthy, responsive and available BizTalk environment.

BizTalk Application Maintenance

BizTalk application maintenance relates to all aspects of the environment above SQL Server. Areas of focus for the Operations Team include:

  • Responding to and actioning monitoring software (e.g. MOM/SCOM) alerts, including errors, warnings and performance issues, in a timely manner.
  • Managing suspended instances to ensure that these do not grow out of hand and cause performance problems. Where suspended instances are caused by development bugs, triage and liaise with development to roll-out patches as necessary; where they are the result of misconfiguration, address any problems.
  • Identifying and apply BizTalk Hotfixes to all environments as necessary. A good place to start is the Microsoft RSS feed for BizTalk 2006 KB articles. Note: this RSS feed appears to be time-based and may not always have entries (thanks to Nikolai for pointing this out).
  • Understanding BizTalk throttling and tweaking parameters as necessary based on historical performance statistics and knowledge of the product domain (e.g. does the application need to handle larger volumes during certain times of the year).
  • Ensuring that the TDDS Tracking Service is running and that tracked messages are being moved to the Tracking Database.
  • Maintaining BizTalk Hosts and Host Instances, provisioning and decommissioning as necessary.
  • Maintaining Adapters, installing and installing as necessary.
  • Understanding options for scaling-up and scaling-out of the application tier; perform scaling as required, before performance becomes an issue.
  • Understanding some of the underlying developer-orientated concepts, including subscriptions, pipelines, maps etc.; a good understanding of the Orchestration debugger is also crucial.
  • Becoming one with the MsgBoxViewer tool to identify potential performance issues before they happen.
  • Running the BizTalk 2006 Best Practices Analyser at regular intervals to identify any non ‘best-practice’ issues.
  • Managing third-party adapter tools that interface directly to BizTalk, such as the Covast EDI Accelerator.
  • Maintaining operational documentation, including known issues, fixes and resolutions – a Wiki is an excellent resource to manage this knowledge.
  • Scripting as much as possible, particularly known, reoccurring situations. E.g. WMI scripts to clear-down any ‘harmless’ known suspended instances, such as zombies. The more that is scripted, the less chance of manual error. Scripting can either be performed in PowerShell, VBScript or C#.
  • Maintaining all scripts, bindings and configuration settings in source control to ensure proper versioning. Ensure all environments are updated with the same version of the tools.
  • Performing deployments (and have sufficient knowledge of BizTalk, SQL Server and the product domain to make decisions on deployment issues without having to go back to the development team).

Database Maintenance

This goes without saying, but unless you team maintains the health of the underlying SQL Server database the BizTalk environment will not perform as expected. To maintain optimum health, the team needs to:

  • Ensure that the BizTalk SQL Agent jobs are running successfully and are not running for an excessive length of time.
  • Ensure that tracking data is cleared down using the Purge and Archive jobs and that historical archive data is made available in an offline mode (i.e. on a different SQL Server) for analysis and reporting.
  • Ensure that backups are taken, using the BizTalk Backup job, and that the resulting backup data and log files are verified.
  • Monitor performance of SQL Server environment through a monitoring tool to ensure that the server/s are not exceeding CPU, memory or IO load; scale-up or -out as necessary.
  • Monitor replication performance and/or automagically restore backups to a DR environment, to ensure continuity of service in the event of downtime; respond to any incidents that arise in the restore.
  • Understand what can and more importantly what can’t be done on a SQL Server that is hosting BizTalk.
  • Understand options for scaling out the database tier and in particular, the Message Box; perform scaling as required, before performance becomes an issue.
  • Identify and apply SQL Server Hotfixes to all environments as necessary.

I would recommend that DBA’s also read the excellent Microsoft KB Article How to maintain and troubleshoot BizTalk Server databases.

Disaster Recovery

Disaster recovery is unfortunately often overlooked until it is too late. The Operations team should perform regular reviews and tests of their DR plan to ensure it is upto date and effective. Areas of focus for the team include:

  • Switching the live environment over to disaster recovery at regular intervals (every quarter / every six months) to prove the disaster recovery plan and to give confidence to the business. The switch to DR should be for a short period – 1 to 2 days – during a period of known ‘slack’. Switching to DR should be straightforward and (almost) entirely automated to ensure manual error is minimised.
  • Where there are problems with the plan, refine as necessary. Keep the master recovery document on a Wiki for example, but ensure an up-to-date hardcopy is kept off-site.
  • Ensuring that all members of the team have confidence in the plan and are prepared to invoke it as necessary.

Infrastructure and General Maintenance

There are a number of day-to-day infrastructure and general maintenance tasks that the team will need to complete during the lifetime of an environment, including:

  • Application of Windows Updates as necessary during scheduled down-time.
  • After creating new environments, run the BizTalk 2006 Best Practices Analyser to check for any non ‘best-practice’ issues.
  • Liaising with infrastructure team to ensure environments are correctly built before operation commences, including correct SAN RAID configuration, clustering etc. Work with DBA’s to ensure that the layout of data and log files is correct based on the role of the databases (BizTalkMsgBoxDb vs. BizTalkMgmtDb for example). Ensure elements of the environment (e.g a BizTalk Server / A SQL Server node etc.) are cleanly removed before downtime commences to actioned failed hardware.
  • Liaising with networking team to ensure necessary ports are open on firewalls etc. for traversal of traffic for both the underlying SQL Server Infrastructure and external access.
  • Liaising with security team to ensure correct Active Directory Domain users and groups are created and maintained to ensure a well running system.

For those of you who are a member of a BizTalk operational support team (or as a consultant), are there other recommendations you’d like to share?

Reblog this post [with Zemanta]

Hyper-V Dramatically Lowers the Price Point for BizTalk 2009

Before Hyper-V, if you wanted to run BizTalk in a virtualised, enterprise-grade environment, there was really only one option – VMWare‘s ESX Server*. Unfortunately, there was one great big caveat: Microsoft wouldn’t provide you with support (you had to re-create the problem on physical hardware, which was simply too much bother, before they would offer support). Now we have Microsoft’s Hyper-V, a true first-class bare-metal virtualisation platform and everything has changed.

The BizTalk licencing model is simple – buy a licence (standard, enterprise etc.) for the number of physical processor slots you have in the system. With the introduction of Hyper-V and the ability to run several instances of BizTalk Server 2009 on virtualised instances of Windows Server 2008 we can run, for example, four BizTalk Servers on two physical machines, using just four processor slots, rather than the traditional four physical machines and eight processor slots. An immediate 50% saving on licencing.

Want to provision more BizTalk Servers for a different BizTalk Group on the same hardware? sure, you’ve already paid for the licences on the two machines, so simply launch the images and configure.

There are of course new licencing details to take into consideration, which are fully detailed on the MS Volume Licencing pages. In a nutshell, if you buy Windows Server 2008 Enterprise, you are entitled to run up to four virtual instances at a time under a single server license. There are no specific details for BizTalk Server 2009 yet, but based on comments by Burley Kawasaki, the licencing model looks set not to change.

BizTalk always was at a competitive price point, but the marriage with Hyper-V now makes it a much more compelling solution, especially in highly-available, scale-out scenarios.

If you want to read more about BizTalk Server 2009 running on Hyper-V, I recommend Chris Romp’s recently blog entry BizTalk Server 2006 R2 Hyper-V Guide.

* Ok, so you could run it on Virtual Server 2005, but you wouldn’t do that for an enterprise system, would you….?

Enhanced by Zemanta

Duplicate SOAP Subscription on Dynamic Request-Response Port

Joy of joys, more subscriptions problems to debug this week with some unexpected results.

The first sign of problems were persistence errors:

Microsoft.XLANGs.Core.PersistenceException: Exception occurred when persisting state to the database. ---> Microsoft.BizTalk.XLANGs.BTXEngine.PersistenceItemException: A batch item failed persistence Item-ID dbbd66a3-d748-4051-a238-fea9509efcf4 OperationType MAIO_CommitBatch Status -1061151949 ErrorInfo The message found multiple request response subscriptions. A message can only be routed to a single request response subscription. . ---> Microsoft.BizTalk.XLANGs.BTXEngine.PublishMessageException: Failed to publish (send) a message in the batch. This is usually because there is no one expecting to receive this message.  The error was The message found multiple request response subscriptions. A message can only be routed to a single request response subscription.  with status -1061151949.

With the actual error in bold above – multiple request-response subscriptions were found which aren’t allowed (I’m not running 2006 R2 and don’t have KB923632 installed).

My solution employs an orchestration which consumes a web-service over a dynamic solicit-response port; the port itself has been created using early-binding (creating one of those ugly port names).

Subscriptions Problems

Digging into the dynamic port subscriptions, I was interested to see that a subscription is created for each adapter you have registered in your BizTalk Group. This makes sense as the transport type isn’t known up-front:

Digging into those subscriptions, I discovered two subscriptions that are exactly the same for the SOAP adapter – no wonder we have multiple request-response subscriptions…..

Apart from hacking the database to remove this unwanted subscription entry, I could find no way of updating the subs. on this dynamic port. Interestingly however, creating a new dynamic port from the Admin Console and binding that to the orchestration resulted in the correct number of subscriptions and a working solution.

I’m wondering then where the bug is: I don’t think it can be when the early-bound ports are created during deployment as the subscriptions themselves aren’t created until the port is enlisted, which (AFAIK) happens in exactly the same way for manually created send-ports. The the problem is only evident on early- and not late-bound ports!

I’m now toying with the idea of KB923632 so we don’t have to worry about this issue again, but I’m keen to understand why this is a problem in the first place, so I’d also be pleased to hear from anyone who knows of a Hotfix that addresses this duplicate subscription issue. I also think its a good one to keep in the back of your mind in-case you encounter any ‘interesting’ subscription issues.

The Case for a New Tool?

One final thing on this: diagnosing subscription problems really raises the need for a ‘Subscription Finder’ tool. We currently have the Subscription Viewer in the Admin Console, which is great, but we have to know what subscriptions we are looking for in the first place. What would have been really helpful here would be a tool where I can enter the details of the promoted properties from the Failed Routing Report and the tool locates all subscriptions that would match those properties. I am looking at putting together such a tool, so watch this space.

Enhanced by Zemanta