Exchange 2010 SP3 installation fails on SBS 2011


I had an interesting issue with Exchange 2010 SP3 installation on a SBS 2011 server last night. Installation fails on the Hub Transport Server Role with following errors.

sbs 2011 upgrade sp3 error

 

This made me scratching my head. Why is it trying to remove existing certificate that is used by Exchange? It’s also the default SMTP certificate, that’s why setup was not able to remove it.

After investing further, I see this line in the PowerShell script,

Write-ExchangeSetupLog -Info “Removing default Exchange Certificate”;
Get-ExchangeCertificate | where {$_.FriendlyName.ToString() -eq “Microsoft Exchange”} | Remove-ExchangeCertificate

So it’s trying to remove default Exchange certificate that was created during the initial installation, that has friendly name “Microsoft Exchange”.

I’m thinking, there is no way the Godaddy certificate has Friendly Name “Microsoft Exchange”. After looking at the certificate properties, it is indeed the problem. The Friendly Name is showing “Microsoft Exchange”, instead of mail.domain.com.

In order for us to install SP3, we have to use SBS console to import a temporary certificate, so it updates “LeafCertThumbPrint” property in this registry key,

“HKEY_LOCAL_MACHINE\Software\Microsoft\SmallBusinessServer\Networking”

 Note: you can also update the registry manually with one of thumbprint from existing certificate that is already imported.

Exchange 2010 SP3 installs fine after the cert change.  Since we didn’t export the existing GoDaddy certificate before running SP3 setup, it was removed by the setup. In order for Exchange OA and Activesync clients  to continue function,  we have issue a new certificate request with proper Friendly Name, then import the new certificate. You can also reuse the existing certificate on GoDaddy’s website by using “Re-Key” option, but you might end up with a certificate without private key. To repair the missing private key, you can run following command
   certutil –repairstore my <serial number>

 

 

All Exchange 2013 Servers become unusable with permissions errors


Overview

The title might sound a bit scary but this one was actually a pretty easy fix. It’s a lesson in not digging yourself into a deeper hole than you’re already in during troubleshooting. I wish I would’ve had this lesson 10yrs ago 🙂

Scenario

The customer was unable to login to OWA, EAC, or Exchange Management Shell on any Exchange 2013 SP1 server in their environment. The errors varied quite a bit, when logging into OWA they would get:

“Something went wrong…

A mailbox could not be found for NT AUTHORITY\SYSTEM.”

When trying to open EMS you would receive a wall of red text which would essentially be complaining about receiving a 500 internal server error from IIS.

In the Application logs I would see an MsExchange BackEndRehydration Event ID 3002 error stating that “NT AUTHORITY\SYSTEM does not have token serialization permission”.

Something definitely seemed to be wrong with Active Directory as this was occurring on all 3 of the customers Exchange 2013 servers; one of which was a DC (more on that later).

Resolution

So one of the 1st questions I like to ask of customers is “when was the last time this was working?” After a bit of investigation I was able to find out that the customer had recently been trying unsuccessfully to create a DAG from his 3 Exchange 2013 SP1 servers. They could get two of the nodes to join but the 3rd would not (the one that was also a DC). The customer thought it was a permissions issue so they had been “making some changes in AD” to try to resolve them. I asked if those changes were documented; the silence was my answer….. 🙂

However, this current issue was affecting all Exchange 2013 servers & not just the one that’s also a DC so I was a bit perplexed as to what could’ve caused this.

So a bit of time on Bing searching for Token Serialization errors brought me to MS KB2898571. The KB stated that if the Exchange Server computer account was a member of a restricted group then Token Serialization Permissions would be set to Deny for it. These Restricted Groups are:

  • Domain Admins
  • Schema Admins
  • Enterprise Admins
  • Organization Management

The KB mentioned running gpresult /scope computer /r on the Exchange servers to see if they were showing as members of any of the restricted groups (see article for further detail & screenshots of the commands). I ran this command on all 3 Exchange 2013 servers & it showed their Computer accounts were all members of the Domain Admins group. In Active Directory Users & Computers I looked at each Exchange Server Computer account (on the Member Of tab) & unfortunately there were no direct ACL assignments so I had to search the membership chain of each common group that the servers were members of. The common groups that all Exchange Server Computer accounts were members of were:

  • Domain Computers
  • Exchange Install Domain Servers
  • Exchange Servers
  • Exchange Trusted Subsystem
  • Managed Availability Servers

Eventually I found that the Exchange Install Domain Servers group had been added as a member of the Domain Admins group during the customers troubleshooting efforts to get all their servers added as DAG members. I removed the Exchange Install Domain Servers group as a member of the Domain Admins group & then rebooted all of the Exchange servers. After the reboots the issues went away & the customer was able to access OWA/EMS.

Now this is where I had to explain to the customer that it was not supported to have an Exchange Server that was also a Domain Controller as a member of a Failover Cluster/DAG. This was why they were having such a hard time adding their Exchange server/DC as a member of their DAG.

Conclusion

I have a saying that I came up with called “troubleblasting”. i.e. “John doesn’t troubleshoot, he troubleblasts!” It started out as just a cheesy joke amongst colleagues back in college but I’ve started to realize just how dangerous it can be. It’s that state you can sometimes get into when you’re desperate, past the point of documenting anything you’re doing out of frustration, & just throwing anything you can up against the wall to see what sticks & resolves your issue. Sometimes it can work out for you but sometimes it can leave you in a state where you’re worse off than when you started. Let this be a lesson to take a breath, re-state what you’re trying to accomplish, & if what you’re doing is really the right thing given the situation. In this case, an environment was brought to its knees because a bit of pre-reading on supportability was not done beforehand & a permission change adversely affected all Exchange 2013 servers.

If you can make it to Exchange Connections in Las Vegas this September, I’ll be presenting a session on “Advanced troubleshooting procedures & tools for Exchange 2013”. Hopefully I can share some tips/tools from the field that have proven useful & can keep you from resorting to the “Troubleblasting Cannon of Desperation” 🙂

AD Certificate Services not starting due to database in Dirty Shutdown


Background

I had a customer running SBS (Small Business Server) 2011, which runs Exchange 2010, who needed to renew their SSL Certificate as it had recently expired. I have quite a bit of experience with SBS since we have a large Support customer base running it & while it can be a pain to troubleshoot because of so many moving pieces (AD/Exchange/SharePoint/WSUS/SQL/RD Gateway all on one box) there are a few cool features. One of these features is the “Setup your Internet Address” wizard. Because SBS is also its own Certificate Authority the wizard will generate a certificate for you & assign it to Exchange/IIS/RD Gateway. It will also configure all the Exchange virtual directories for you as well as create a certificate install package for you to deploy to non-domain joined systems so your Outlook Anywhere clients will trust your CA.

 

Issue

However, when going to re-run the wizard to renew the certificate I received an error regarding the Active Directory Certificate Service not running. The System event logs had a 7024 event from “Service Control Manager” stating “The Active Directory Certificate Services service terminated with the service-specific error %%939523546”.

So we were unable to request a new certificate & the customer was hoping to avoid purchasing a third-party certificate since they had been working fine (for an extremely small shop) like this for several years.

 

Solution

After researching the error, I found that the error code given pointed to the Certificate Authority database being corrupted. So I navigated to C:\Windows\System32\Certlog & I found an old friend; an ESE (Extensible Storage Engine) database file.  If you didn’t know already, ESE isn’t just used for Exchange.

AD Certificate Services, DHCP (C:\Windows\System32\dhcp\dhcp.mdb), & Active Directory itself (C:\Windows\NTDS\ntds.dit) all use ESE databases.

The caveat however is that instead of ESEUTIL, you should use ESENTUTL to work with them.

(Additional references 1 & 2)

So I ran esentutl /mh <CA Name>.edb to view the header of the database file & found that it was in a Dirty Shutdown. I then tried to run a Recovery against the database by running Esentutl /r edb but this failed.

If this were an Exchange database then this would be where I would try to restore from a backup. Unfortunately this customer did not have a backup of their CA database file (I think a lot of customers would fall into this category) so I had to move onto running a Repair which is the dreaded “/P”.

Microsoft Support offers strict guidance around running a “/p” on Exchange (like performing a Defrag or a Mailbox Move followed by an Integrity Check/Mailbox Repair immediately after having to run a /p; Also, it should be considered a LAST resort) but no such guidance exists for Certificate Services since it is a much MUCH simpler database structure. But a ‘/P” is almost always a destructive action, with associated data loss, so if you have a backup you should always pursue that option first

I ran esentutl /p <CA Name>.edb & after it completed I was then able to start the Active Directory Certificate Services Service. All the proper data (including Issued Certificates & Templates) were still there & after re-running the SBS “Setup your Internet Address” wizard the customer now had a renewed certificate.