All Exchange 2013 Servers become unusable with permissions errors


Overview

The title might sound a bit scary but this one was actually a pretty easy fix. It’s a lesson in not digging yourself into a deeper hole than you’re already in during troubleshooting. I wish I would’ve had this lesson 10yrs ago 🙂

Scenario

The customer was unable to login to OWA, EAC, or Exchange Management Shell on any Exchange 2013 SP1 server in their environment. The errors varied quite a bit, when logging into OWA they would get:

“Something went wrong…

A mailbox could not be found for NT AUTHORITY\SYSTEM.”

When trying to open EMS you would receive a wall of red text which would essentially be complaining about receiving a 500 internal server error from IIS.

In the Application logs I would see an MsExchange BackEndRehydration Event ID 3002 error stating that “NT AUTHORITY\SYSTEM does not have token serialization permission”.

Something definitely seemed to be wrong with Active Directory as this was occurring on all 3 of the customers Exchange 2013 servers; one of which was a DC (more on that later).

Resolution

So one of the 1st questions I like to ask of customers is “when was the last time this was working?” After a bit of investigation I was able to find out that the customer had recently been trying unsuccessfully to create a DAG from his 3 Exchange 2013 SP1 servers. They could get two of the nodes to join but the 3rd would not (the one that was also a DC). The customer thought it was a permissions issue so they had been “making some changes in AD” to try to resolve them. I asked if those changes were documented; the silence was my answer….. 🙂

However, this current issue was affecting all Exchange 2013 servers & not just the one that’s also a DC so I was a bit perplexed as to what could’ve caused this.

So a bit of time on Bing searching for Token Serialization errors brought me to MS KB2898571. The KB stated that if the Exchange Server computer account was a member of a restricted group then Token Serialization Permissions would be set to Deny for it. These Restricted Groups are:

  • Domain Admins
  • Schema Admins
  • Enterprise Admins
  • Organization Management

The KB mentioned running gpresult /scope computer /r on the Exchange servers to see if they were showing as members of any of the restricted groups (see article for further detail & screenshots of the commands). I ran this command on all 3 Exchange 2013 servers & it showed their Computer accounts were all members of the Domain Admins group. In Active Directory Users & Computers I looked at each Exchange Server Computer account (on the Member Of tab) & unfortunately there were no direct ACL assignments so I had to search the membership chain of each common group that the servers were members of. The common groups that all Exchange Server Computer accounts were members of were:

  • Domain Computers
  • Exchange Install Domain Servers
  • Exchange Servers
  • Exchange Trusted Subsystem
  • Managed Availability Servers

Eventually I found that the Exchange Install Domain Servers group had been added as a member of the Domain Admins group during the customers troubleshooting efforts to get all their servers added as DAG members. I removed the Exchange Install Domain Servers group as a member of the Domain Admins group & then rebooted all of the Exchange servers. After the reboots the issues went away & the customer was able to access OWA/EMS.

Now this is where I had to explain to the customer that it was not supported to have an Exchange Server that was also a Domain Controller as a member of a Failover Cluster/DAG. This was why they were having such a hard time adding their Exchange server/DC as a member of their DAG.

Conclusion

I have a saying that I came up with called “troubleblasting”. i.e. “John doesn’t troubleshoot, he troubleblasts!” It started out as just a cheesy joke amongst colleagues back in college but I’ve started to realize just how dangerous it can be. It’s that state you can sometimes get into when you’re desperate, past the point of documenting anything you’re doing out of frustration, & just throwing anything you can up against the wall to see what sticks & resolves your issue. Sometimes it can work out for you but sometimes it can leave you in a state where you’re worse off than when you started. Let this be a lesson to take a breath, re-state what you’re trying to accomplish, & if what you’re doing is really the right thing given the situation. In this case, an environment was brought to its knees because a bit of pre-reading on supportability was not done beforehand & a permission change adversely affected all Exchange 2013 servers.

If you can make it to Exchange Connections in Las Vegas this September, I’ll be presenting a session on “Advanced troubleshooting procedures & tools for Exchange 2013”. Hopefully I can share some tips/tools from the field that have proven useful & can keep you from resorting to the “Troubleblasting Cannon of Desperation” 🙂

Quick method to diagnose Exchange Active Directory Access & Service Startup Issues


Background:

My colleague Jedidiah Hammond wrote a great post awhile back on troubleshooting Exchange Service start-up issues. One of the main areas of focus of the post were issues with Active Directory Global Catalog servers. This can be considered an ad-on to that post as I’ll describe a useful method to troubleshoot Exchange permissions in Active Directory; more specifically, verifying Exchange has the proper access to the Global Catalog servers in and out of it’s respective Active Directory site.

Scenario:

Suppose you find that the Microsoft Exchange Active Directory Topology Service isn’t starting; or the System Attendant, or the Information Store service. Or perhaps the Exchange Management Console or Exchange Management Shell will not connect and is complaining of Active Directory/Global Catalog issues.
Often times this is a result of a port being blocked by Anti-V/Firewall between the Exchange Server and your Global Catalog. Or possibly a configuration issue on the network stack (IP/DNS/etc); maybe someone even powered your GC off much to your dismay. Assuming you have already worked through the above scenarios, one useful tool to verify Exchange/AD functionality is actually a very commonly used one; Event Viewer.

When you first deploy Exchange and run “setup /PrepareAD” (or you let the GUI setup do it for you) it is actually setting many of these permissions in AD. (For a list of all of these changes see this Technet article).

Steps:

Below is an excerpt from MSExchange ADAccess Informational Event ID 2080. You’ll find it occurring roughly every 15min on your Exchange Servers.
Description:
Process STORE.EXE (PID=3376). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
 (Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
In-site:
Austin.ASH.ORG    CDG 1 7 7 1 0 1 1 7 1
 Out-of-site:
Houston.ASH.ORG    CDG 1 7 7 1 0 1 1 7 1

This is an example of what the output should look like. You might be asking what those series of numbers represent. Well buried deep within the land of Exchange 2000 there lies a KB article explaining just that.

After reading the article you’ll find that these numbers are basically describing Exchange’s understanding of the Global Catalog servers made available to it; along with whether or not it has the proper ACLs set to be able to utilize them. If you find yourself pulling your hair out as to why Exchange is showing the symptoms I listed earlier, then look for this event on your Exchange server and you just might see something like the following:

Description:
Process STORE.EXE (PID=3376). Exchange Active Directory Provider has discovered the following servers with the following characteristics:
 (Server name | Roles | Enabled | Reachability | Synchronized | GC capable | PDC | SACL right | Critical Data | Netlogon | OS Version)
In-site:
Austin.ASH.ORG    CDG 1 7 7 1 0 0 1 7 1
 Out-of-site:
Houston.ASH.ORG    CDG 1 7 7 1 0 0 1 7 1

Notice it ends with “0171” instead of “1171”. If we reference the above KB article then this tells us Exchange lacks the proper ACL’s in AD.

I’ve seen this many times with customers who have modified the Default Domain Controllers Group Policy or somehow blocked it’s use. I’ve also seen similar issues arise from unchecking “Include Inheritable Permissions from this Object’s Parent” in AD for various objects. If this is the case then please see the post I referenced earlier on how to resolve that. In addition, I’ve found re-running “setup.com /PrepareAD” to be a very useful troubleshooting step in situations such as these where you feel AD permissions may be at fault. Some customers have been weary of running this but honestly their fears stem from ignorance because “it just sounds scary” ; a quick read over the article I referenced earlier will tell you that running it again will only re-add the permissions Exchange has needed all along.
However, be aware that re-running PrepareAD may only resolve the issue temporarily as any bad Group Policies may find themselves being re-applied in about 15min so fixing the actual source of the issue should be the ultimate goal.

An additional note here is if you’re utilizing AD Split permissions with Exchange, there may be additional precautions to be taken before running PrepareAD  again.

Doing a Disaster Recovery on a Exchange Server that is also a DC


email_exchange_iconHave you every worked on a failed exchange server that also happens to be a DC (not recommended, but it happens)

Well if you do and you find yourself trying to recover it here is how you can.

  1. Note critical information
    1. What are the drive letters
    2. Where is the logs and database located
    3. What is the service pack level
  2. Remove data from server
  3. Format and re-install the OS – using the same drive letters
  4. Seize Roles if they were on the failed server
  5. Run through a metadata cleanup to remove the failed server from AD
  6. Replicate changes to all DCs
  7. Join rebuilt server to the domain  – Using the Same name
  8. Add the Server object to the correct exchange groups
    1. Exchange 2007 – “Exchange Servers”, “Exchange Install Domain Servers”
    2. Exchange 2010 – “Exchange Servers”, “Exchange Install Domain Servers”, “Exchange Trusted Subsystem”
    3. Exchange 2003 – “Exchange Domain Servers”
  9. Windows Update the Server
  10. Do a disaster recovery install of exchange
    1. Exchange 2003 = setup /disasterrecovery
    2. Exchange 2007\2010 = Setup.com /m:recoverserver
  11. Restore data using backup application or recovered databases from failure
  12. and away you go!