Wednesday, April 07, 2010

Exchange 2007 problem licked (I hope!)

I've had a number of occasions when my test lab Exchange 2007 servers (a Mailbox and a HubCas box) start up and then give me errors on a number of service and fail to start. This is normally because I've taken action to stop all my VM's copy some datastores and then start up again. It's usually a weekend activity, but due to power fluctuations and some hassle, it happened this afternoon.

Usually some brute force stop/start activity (it is a test lab!) fixes it, but over a period of time, and today was no exception. However trouble struck again tonight, and for some reason I decided enough was enough. The events (amongst others) are 2114 on DS Access, and a whole bunch on Topology checks

I hit the usual options (Google, Microsoft knowledgebase, and and came up with a bunch of links that I looked at, a couple interested me (for different reasons!)
Microsoft Knowledgebase  (note how this one is nicely for Exchange 2003/2000 and not 2007!)

Event id’s page came up with a whole host of options, but in this case the words
“A possible root cause is an additional DNS A record for a DC in the Exchange Servers Site, record that happens to be for an interface for which the Exchange Server has no connectivity.
In our case, all of our servers have a secondary NIC that is used for tape backup traffic. This interface has no routing to the real network that AD and Exchange live on. So here's what, even though DNS registration is disabled on this secondary NIC, it still registers itself. If a system has DNS installed, each time the DNS Server starts or a zone is reloaded, it registers all interfaces that are configured to answer DNS queries. To determine if the Exchange server (or any member server or client) has resolved an IP for a DC, use nltest /dsgetdc:"domain". See M275554 for additional information”
Piqued my interest.

This was because the main DC in my network (originally the workhorse that ran my entire business 6 years ago) does have a bunch of IP’s on it:
• My main network’s IP
• A IP from a second subnet for connecting to other kit
• 2 NIC’s from VMware Server (which is hosting a single “off my ESX kit” VirtualCentre VM for management purposes*)

So I wondered – were the Exchange boxes selecting a bad IP to use for AD Topology Discovery and getting confused.

So, I:
• Disabled the VMware NIC’s (not strictly necessary in my setup)
• Removed the second subnet from the ‘proper’ NIC
• Removed the (now invalid) A records from my internal DNS servers
• IPCONFIG /FLUSHDNS on the exchange boxes (just in case)

On restart of the exchange services everything came up trumps!

Time for bed methinks.

*The reason for this is that if I hose the ESX setup, I can still manage it – VMware (and I) recommend in a proper production setup , self hosting the VirtualCentre box

Updated: a few minutes later...
As a full test of "does this work", I've restarted all the Exchange boxes, and it works a treat.  Services came up really fast this time (normally I just leave running and come back after a cuppa), but no, this time almost as soon as I had logged onto the servers, the services were running.  Nice!

No comments: