The BizTalk Ops Team – Maintaining a Healthy, Responsive and Available BizTalk Environment

Originally posted by Nick Heppleston at: http://www.modhul.com/2008/12/22/the-biztalk-ops-team-maintaining-a-healthy-responsive-and-available-biztalk-environment/

One of the things that surprises me about BizTalk installations is, in my experience, the limited support they receive once a project has gone live. BizTalk is a large enterprise product and a dedicated team of BizTalk operational specialists and SQL Server DBA’s should be created for the task of maintaining operational and test environments.

In this blog-post, I’ll run over some of the responsibilities that I believe a BizTalk operational support team need to focus on to maintain a healthy, responsive and available BizTalk environment.

BizTalk Application Maintenance

BizTalk application maintenance relates to all aspects of the environment above SQL Server. Areas of focus for the Operations Team include:

  • Responding to and actioning monitoring software (e.g. MOM/SCOM) alerts, including errors, warnings and performance issues, in a timely manner.
  • Managing suspended instances to ensure that these do not grow out of hand and cause performance problems. Where suspended instances are caused by development bugs, triage and liaise with development to roll-out patches as necessary; where they are the result of misconfiguration, address any problems.
  • Identifying and apply BizTalk Hotfixes to all environments as necessary. A good place to start is the Microsoft RSS feed for BizTalk 2006 KB articles. Note: this RSS feed appears to be time-based and may not always have entries (thanks to Nikolai for pointing this out).
  • Understanding BizTalk throttling and tweaking parameters as necessary based on historical performance statistics and knowledge of the product domain (e.g. does the application need to handle larger volumes during certain times of the year).
  • Ensuring that the TDDS Tracking Service is running and that tracked messages are being moved to the Tracking Database.
  • Maintaining BizTalk Hosts and Host Instances, provisioning and decommissioning as necessary.
  • Maintaining Adapters, installing and installing as necessary.
  • Understanding options for scaling-up and scaling-out of the application tier; perform scaling as required, before performance becomes an issue.
  • Understanding some of the underlying developer-orientated concepts, including subscriptions, pipelines, maps etc.; a good understanding of the Orchestration debugger is also crucial.
  • Becoming one with the MsgBoxViewer tool to identify potential performance issues before they happen.
  • Running the BizTalk 2006 Best Practices Analyser at regular intervals to identify any non ‘best-practice’ issues.
  • Managing third-party adapter tools that interface directly to BizTalk, such as the Covast EDI Accelerator.
  • Maintaining operational documentation, including known issues, fixes and resolutions – a Wiki is an excellent resource to manage this knowledge.
  • Scripting as much as possible, particularly known, reoccurring situations. E.g. WMI scripts to clear-down any ‘harmless’ known suspended instances, such as zombies. The more that is scripted, the less chance of manual error. Scripting can either be performed in PowerShell, VBScript or C#.
  • Maintaining all scripts, bindings and configuration settings in source control to ensure proper versioning. Ensure all environments are updated with the same version of the tools.
  • Performing deployments (and have sufficient knowledge of BizTalk, SQL Server and the product domain to make decisions on deployment issues without having to go back to the development team).

Database Maintenance

This goes without saying, but unless you team maintains the health of the underlying SQL Server database the BizTalk environment will not perform as expected. To maintain optimum health, the team needs to:

  • Ensure that the BizTalk SQL Agent jobs are running successfully and are not running for an excessive length of time.
  • Ensure that tracking data is cleared down using the Purge and Archive jobs and that historical archive data is made available in an offline mode (i.e. on a different SQL Server) for analysis and reporting.
  • Ensure that backups are taken, using the BizTalk Backup job, and that the resulting backup data and log files are verified.
  • Monitor performance of SQL Server environment through a monitoring tool to ensure that the server/s are not exceeding CPU, memory or IO load; scale-up or -out as necessary.
  • Monitor replication performance and/or automagically restore backups to a DR environment, to ensure continuity of service in the event of downtime; respond to any incidents that arise in the restore.
  • Understand what can and more importantly what can’t be done on a SQL Server that is hosting BizTalk.
  • Understand options for scaling out the database tier and in particular, the Message Box; perform scaling as required, before performance becomes an issue.
  • Identify and apply SQL Server Hotfixes to all environments as necessary.

I would recommend that DBA’s also read the excellent Microsoft KB Article How to maintain and troubleshoot BizTalk Server databases.

Disaster Recovery

Disaster recovery is unfortunately often overlooked until it is too late. The Operations team should perform regular reviews and tests of their DR plan to ensure it is upto date and effective. Areas of focus for the team include:

  • Switching the live environment over to disaster recovery at regular intervals (every quarter / every six months) to prove the disaster recovery plan and to give confidence to the business. The switch to DR should be for a short period – 1 to 2 days – during a period of known ‘slack’. Switching to DR should be straightforward and (almost) entirely automated to ensure manual error is minimised.
  • Where there are problems with the plan, refine as necessary. Keep the master recovery document on a Wiki for example, but ensure an up-to-date hardcopy is kept off-site.
  • Ensuring that all members of the team have confidence in the plan and are prepared to invoke it as necessary.

Infrastructure and General Maintenance

There are a number of day-to-day infrastructure and general maintenance tasks that the team will need to complete during the lifetime of an environment, including:

  • Application of Windows Updates as necessary during scheduled down-time.
  • After creating new environments, run the BizTalk 2006 Best Practices Analyser to check for any non ‘best-practice’ issues.
  • Liaising with infrastructure team to ensure environments are correctly built before operation commences, including correct SAN RAID configuration, clustering etc. Work with DBA’s to ensure that the layout of data and log files is correct based on the role of the databases (BizTalkMsgBoxDb vs. BizTalkMgmtDb for example). Ensure elements of the environment (e.g a BizTalk Server / A SQL Server node etc.) are cleanly removed before downtime commences to actioned failed hardware.
  • Liaising with networking team to ensure necessary ports are open on firewalls etc. for traversal of traffic for both the underlying SQL Server Infrastructure and external access.
  • Liaising with security team to ensure correct Active Directory Domain users and groups are created and maintained to ensure a well running system.

For those of you who are a member of a BizTalk operational support team (or as a consultant), are there other recommendations you’d like to share?

Reblog this post [with Zemanta]