NIC DNS Registration and Exchange Servers


Symptom

I recently worked with a customer who had introduced an Exchange 2013 Server into an existing Exchange 2007 environment. The issue was the 2013 Server was unable to send email anywhere; neither externally or to other Exchange Servers. If you executed the below command to view the status of the transport queues you received the below output:

Get-Queue <Queue Identity> | FL

NIC

Specifically, the error message you would receive is “4.4.0 DNS query failed. The error was: DNS query failed with error ErrorRetry”

This is a fairly common error indicating there is an issue contacting the DNS Server or Servers that Exchange is configured to use. ReferenceA ReferenceB

Resolution

However, in this case the issue was not obvious, unless you had already seen this issue before or knew a little bit about the health checks Exchange uses to ensure it’s healthy.

I remembered seeing a similar issue on a Reddit thread awhile back, which caused me to search and find this Microsoft KB article titled “DNS query failed” error when an email message is stuck in the Draft folder in an Exchange Server 2013 environment”.

This was the resolution in my scenario as well. To resolve the issue, I simply had to re-check the “Register this connection’s addresses in DNS” option on the IPv4> Properties>Advanced>DNS tab on the primary NIC used for Active Directory communications. While you can uncheck this box on secondary NICs (such as for iSCSI, Replication, Backup, etc.), it should always remain checked on the MAPI/Primary NIC. I’ve also seen issues where having this unchecked on a 2013/2016 DAG node will result in Managed Availability-triggered database failovers.

Advertisements

Failures when proxying HTTP requests from Exchange 2013 to a previous Exchange version


Overview

I’ve seen this issue a few times over the past months & most recently this past week with a customer. Luckily there’s a fairly simple fix to the issue published by Microsoft, but realizing not everyone remembers every Microsoft KB that gets released I thought I’d shine a spotlight on this one.

Scenario

As part of the migration process, when customers move their namespace from either Exchange 2007 or 2010 to 2013, HTTP connections start proxying through 2013 to the legacy Exchange Servers and some users will experience failures. The potential affected workloads are:
AutoDiscover
Exchange Web Services (Free/Busy)
ActiveSync
OWA
Outlook

Test or new mailboxes may not be affected.

Resolution

The cause of this is the age old problem of Token Bloat. Users being members of too many groups or having large tokens.

The fix is to implement the changes in the below Microsoft KB article

“HTTP 400 Bad Request” error when proxying HTTP requests from Exchange Server 2013 to a previous version of Exchange Server
https://support.microsoft.com/en-us/kb/2988444

The interesting thing in this scenario is that the issue was not experienced in the legacy version of Exchange & even if you look at the tokens themselves, they may not seem overly large. It seems that the process of proxying Exchange traffic is much more sensitive to this issue. Also, in a recent case that went to Microsoft, even if you increase the recommended values to a value higher than your current headers it may not have the desired effect. In our case we had to set the MaxRequestBytes & MaxFieldLength values to exactly match the values in the Microsoft KB (65536 (Decimal)).

For further reading, please see the below articles.

Complimentary Articles

“HTTP 400 – Bad Request (Request Header too long)” error in Internet Information Services (IIS)
https://support.microsoft.com/en-us/kb/2020943

How to use Group Policy to add the MaxTokenSize registry entry to multiple computers
https://support.microsoft.com/en-us/kb/938118

 

Additional Note

As an FYI, another issue I commonly see when namespaces get transitioned to 2013 is authentication popups when connections proxy to the legacy Exchange Servers. Please see the below KB for that issue

Outlook Anywhere users prompted for credentials when they try to connect to Exchange Server 2013
https://support.microsoft.com/en-us/kb/2990117

I also blogged about it here
https://exchangemaster.wordpress.com/2014/10/30/exchange-2010-outlook-anywhere-users-receiving-prompts-when-proxied-through-exchange-2013/

Exchange 2010 Outlook Anywhere users receiving prompts when proxied through Exchange 2013


Background

I was working with a customer who had Exchange 2010 & were in the process of migrating to Exchange 2013. As part of their migration process they pointed their Exchange 2010 Outlook Anywhere namespace (let’s call it mail.contoso.com) to Exchange 2013 in DNS. At this point all of their Outlook Anywhere clients should have been connecting to Exchange 2013 & then been proxied to Exchange 2010. While this was somewhat working, they also immediately noticed users were randomly being prompted for credentials, resulting in a negative user experience.

Sometimes the prompts would be when connecting to Public Folders while other times mail or directory connections from Outlook to Exchange.

Resolution

When I was approached with this issue/symptom it sounded familiar. After a search through my OneNote I realized I previously had a discussion with some people I know from Microsoft Support regarding this issue. Turns out this issue was recently addressed via http://support2.microsoft.com/kb/2990117 “Outlook Anywhere users prompted for credentials when they try to connect to Exchange Server 2013”.

This is actually an IIS issue with Server 2008 R2 (the operating system Exchange 2010 was installed on) that’s resolved by a hotfix. After installing the hotfix & rebooting the issue was resolved & their users no longer received the prompts.

 

 

Legacy Public Folder remnants in Exchange 2013 cause “The Microsoft Exchange Administrator has made a change…” prompt


Background

I usually refrain from writing posts on issues where I haven’t been able to fully reproduce them in my lab but enough people seem to be having this issue that it would be good to spread the word should another person find themselves afflicted by it. I’ve seen this issue happen in two different environments & then found out via the forums that several other people have run into it as well.

Issue

I was working with a customer who migrated from Exchange 2007 to Exchange 2013. After decommissioning the 2007 servers, all the Exchange 2013 mailboxes started getting the infamous “The Microsoft Exchange Administrator has made a change that requires you quit and restart Outlook” prompt.

This seemed odd because Exchange 2013 was supposed to all but eliminate those prompts. While it did eliminate the prompts when the RPC Endpoint (Server Name field in Outlook) changed, there are still other scenarios that could result in this prompt (please see reference links at bottom of post for a detailed history). One such thing relates to the Public Folder Hierarchy.

In this customer’s scenario, I determined that the “PublicFolderDatabase” attribute on every Exchange 2013 Mailbox Database was set to a value resembling the screenshot below:

Admin

In this case, the decommissioning of Exchange 2007 & its Legacy Public Folders was not done correctly (same issue probably would have occurred if it were 2010). The Public Folder Database was showing up as a deleted object in AD. So the result was that the Outlook clients were trying to access Public Folder information but were reacting in a way that resulted in the frequent prompt to restart Outlook.

The resolution in this case was to drill down to the properties of the Mailbox Database in ADSIEDIT & set the value of “msExchHomePublicMDB” to be blank. Afterwards, a restart of the Information Store Service resolved the issue.

Additional Info

Not long after this issue, I was contacted by a Consultant I know who encountered the exact same issue. After an improperly performed Exchange 2007 migration, the Exchange 2013 mailboxes were getting prompted to restart Outlook. That environment also had Mailbox Databases that were pointed to a deleted object for their default Public Folder Database. Clearing the value & restarting the Information Store Service also resolved their issue.

After hearing this I went online to see if any others were encountering this issue. I found the below two forum posts

Reference A

Reference B

I then tried to reproduce this in my own environment but could not. Manually deleting the Exchange 2007 Server object from AD as well as manually deleting the Public Folder Database object did leave the 2013 Mailbox Databases pointing to the ghosted objects, but I did not receive the prompts. It appears there’s a particular chain of events that causes this issue but even though I could not recreate them in my lab, it certainly seems like people are running into the issue in the wild. If you start receiving these prompts then I suggest looking to make sure your attributes are not also pointed to ghosted objects.

Note: I was also informed that you could leave yourself in this scenario by incorrectly performing a migration from Legacy Public Folders to Modern Public Folders.

During the migration, you run the “Set-Mailbox <PublicFolderMailboxName> –PublicFolder –IsExcludedFromServingHierarchy:$True” command to prevent the Modern Public Folders from serving the Hierarchy requests while you’re moving data over; when you eventually complete the migration you should run “Set-Mailbox <PublicFolderMailboxName> –PublicFolder –IsExcludedFromServingHierarchy:$False” to allow it to serve the Hierarchy requests. If you do not run this command then you may receive the same prompts.

Additional References

http://blogs.msdn.com/b/aljackie/archive/2013/11/14/outlook-and-rpc-end-point-the-microsoft-exchange-administrator-has-made-a-change-that-requires-you-quit-and-restart-outlook.aspx

http://blogs.technet.com/b/exchange/archive/2011/01/24/obviating-outlook-client-restarts-after-mailbox-moves.aspx

http://blogs.technet.com/b/exchange/archive/2012/05/30/rpc-client-access-cross-site-connectivity-changes.aspx

Mails Stuck In The Draft Folder


Today, I came cross another interesting mail flow issue, where all mails stuck in Draft folders for all users when they are using OWA. You can imagine that mail flow was broken, that non of users can send any mails internally or externally.

Customer has troubleshot it for over 12 hours, and has gone as far as re-install operating system and Exchange 2013 server with /RecoverServer switch, but issue remains.

When I started looking at the issue, I went through series of basic transport troubleshooting steps for Exchange 2013 multirole server, such as checking all transport related services, possible back pressure issue, and state of all server components. Of course, there is nothing wrong with them.

Running out of ideas, I checked settings of send connector, just to make sure there is nothing out of ordinary. I see this in Send Connector properties,

Image

 

There are not many reasons for any Exchange server to use External DNS server for lookups out there. For this environment, it certainly is not needed as well.

I unchecked the box, and restart transport service to speed up the process, but issue remans.

I then run get-TransportService | fl *dns*, to make sure that we don’t have any external DNS settings configured.

   Image

  Ah ha! External DNS server setting is set. I run few tests with nslookup, the DNS server did not respond to any queries. So that’s probably the reason why that mails are not flowing.

  To remove it, you have to run Set-TransportService -ExternalDNSAdapterEnabled $true -ExternalDNSServers $null.

  After restarting the transport service, all mails in the Draft folder are gone. Mail flow is restored!

All Exchange 2013 Servers become unusable with permissions errors


Overview

The title might sound a bit scary but this one was actually a pretty easy fix. It’s a lesson in not digging yourself into a deeper hole than you’re already in during troubleshooting. I wish I would’ve had this lesson 10yrs ago 🙂

Scenario

The customer was unable to login to OWA, EAC, or Exchange Management Shell on any Exchange 2013 SP1 server in their environment. The errors varied quite a bit, when logging into OWA they would get:

“Something went wrong…

A mailbox could not be found for NT AUTHORITY\SYSTEM.”

When trying to open EMS you would receive a wall of red text which would essentially be complaining about receiving a 500 internal server error from IIS.

In the Application logs I would see an MsExchange BackEndRehydration Event ID 3002 error stating that “NT AUTHORITY\SYSTEM does not have token serialization permission”.

Something definitely seemed to be wrong with Active Directory as this was occurring on all 3 of the customers Exchange 2013 servers; one of which was a DC (more on that later).

Resolution

So one of the 1st questions I like to ask of customers is “when was the last time this was working?” After a bit of investigation I was able to find out that the customer had recently been trying unsuccessfully to create a DAG from his 3 Exchange 2013 SP1 servers. They could get two of the nodes to join but the 3rd would not (the one that was also a DC). The customer thought it was a permissions issue so they had been “making some changes in AD” to try to resolve them. I asked if those changes were documented; the silence was my answer….. 🙂

However, this current issue was affecting all Exchange 2013 servers & not just the one that’s also a DC so I was a bit perplexed as to what could’ve caused this.

So a bit of time on Bing searching for Token Serialization errors brought me to MS KB2898571. The KB stated that if the Exchange Server computer account was a member of a restricted group then Token Serialization Permissions would be set to Deny for it. These Restricted Groups are:

  • Domain Admins
  • Schema Admins
  • Enterprise Admins
  • Organization Management

The KB mentioned running gpresult /scope computer /r on the Exchange servers to see if they were showing as members of any of the restricted groups (see article for further detail & screenshots of the commands). I ran this command on all 3 Exchange 2013 servers & it showed their Computer accounts were all members of the Domain Admins group. In Active Directory Users & Computers I looked at each Exchange Server Computer account (on the Member Of tab) & unfortunately there were no direct ACL assignments so I had to search the membership chain of each common group that the servers were members of. The common groups that all Exchange Server Computer accounts were members of were:

  • Domain Computers
  • Exchange Install Domain Servers
  • Exchange Servers
  • Exchange Trusted Subsystem
  • Managed Availability Servers

Eventually I found that the Exchange Install Domain Servers group had been added as a member of the Domain Admins group during the customers troubleshooting efforts to get all their servers added as DAG members. I removed the Exchange Install Domain Servers group as a member of the Domain Admins group & then rebooted all of the Exchange servers. After the reboots the issues went away & the customer was able to access OWA/EMS.

Now this is where I had to explain to the customer that it was not supported to have an Exchange Server that was also a Domain Controller as a member of a Failover Cluster/DAG. This was why they were having such a hard time adding their Exchange server/DC as a member of their DAG.

Conclusion

I have a saying that I came up with called “troubleblasting”. i.e. “John doesn’t troubleshoot, he troubleblasts!” It started out as just a cheesy joke amongst colleagues back in college but I’ve started to realize just how dangerous it can be. It’s that state you can sometimes get into when you’re desperate, past the point of documenting anything you’re doing out of frustration, & just throwing anything you can up against the wall to see what sticks & resolves your issue. Sometimes it can work out for you but sometimes it can leave you in a state where you’re worse off than when you started. Let this be a lesson to take a breath, re-state what you’re trying to accomplish, & if what you’re doing is really the right thing given the situation. In this case, an environment was brought to its knees because a bit of pre-reading on supportability was not done beforehand & a permission change adversely affected all Exchange 2013 servers.

If you can make it to Exchange Connections in Las Vegas this September, I’ll be presenting a session on “Advanced troubleshooting procedures & tools for Exchange 2013”. Hopefully I can share some tips/tools from the field that have proven useful & can keep you from resorting to the “Troubleblasting Cannon of Desperation” 🙂

Exchange 2013 SP1 Breaks Hub Transport service


I had an issue last night that woke me up at 2 am in the morning by the On Call phone. I feel that we might see this often when Exchange admins start to applying Exchange server 2013 SP1.

After installing Exchange 2013 sp1, MSExchange Transport service hangs at “Starting”, then eventually crashes with couple of event ID’s.

“Event ID 1046, MSExchange TrasnportService

Worker process with process ID 17836 requested the service to terminate with an unhandled exception. “

“Event ID 4999, MSExchange Common

Watson report about to be sent for process id: 2984, with parameters: E12IIS, c-RTL-AMD64, 15.00.0847.032, MSExchangeTransport, M.Exchange.Net, M.E.P.WorkerProcessManager.HandleWorkerExited, M.E.ProcessManager.WorkerProcessRequestedAbnormalTerminationException, 5e2b, 15.00.0847.030. “

Only way to get Transport service to start, is to disable all receive connectors and reboot the server. Does it sound familiar? My colleague Andrew Higginbotham  wrote this article few weeks ago. Although it was a different issue, but custom receive connectors on a multirole server is the key.

In my case, this is also a multirole server(CAS and Mailbox on one box).  Hub Transport service listens on TCP port 2525, and Frontend transport listens on TCP port 25.  There are two custom receive connectors that were created with Hub Transport role. Both are listening on TCP port 25. I’m not sure why they haven’t had external mail flow issue by now, but it sure knows how to get your attention by breaking the transport service.

If we disable both custom receive connectors, transport service starts fine. So we went ahead and changed transport role from Hub Transport to Frontend Transport on both connectors with Set-ReceiveConnector powershell cmdlet, then re-enable them to test. Hub Transport service stays up without issue. Of course, we also rebooted the server to make sure that issue is fixed.

 

 

Edit. Microsoft has released the following KB addressing the issue

http://support.microsoft.com/kb/2958036