OOF messages not being sent in Exchange 2013 CU5 Environment


Scenario

Customer had a single server Exchange 2013 SP1 environment that had been migrated from Exchange 2003 over time (double hop migration). The customer installed CU5 & about a month later they noticed Out of Office messages were not being sent for users who had it enabled. The environment had only been running a short while before updating to CU5 so at the time the customer didn’t correlate the issue happening with updating to CU5 (more on this later).

Issue

In this scenario, Mailbox-A would enable their Out of Office, Mailbox-B would send Mailbox-A an email, Mailbox-A would receive the email message but Mailbox-B would never receive an OOF message.

In addition to this, if you looked in the queues you would see the original email message (not the OOF message) queued even though it had already been successfully delivered.

1

 

The LastError on this message stated “432 4.2.0 STOREDRV.Deliver; Agent transient failure during message resubmission [Agent: Mailbox Rules Agent]” To make matters worse, this message would eventually timeout & the original sender (Mailbox-B in my above scenario) would receive an NDR containing “#< #4.4.7 smtp;550 4.4.7 QUEUE.Expired; message expired> #SMTP#” which would lead them to the false belief that the original message was never delivered. If Mailbox-A turned off their OOF then the message in the queue would eventually be removed without an NDR.

This situation not only prevented the customer from utilizing Out Of Office as it was intended to be used, but it also caused NDRs to be generated to anyone sending them emails while OOF was enabled.

Troubleshooting

This was a really perplexing case as I threw just about everything I had in my bag of tricks at it. Before relenting & eventually escalating to Microsoft I performed the following:

  • Using Get-TransportAgent to verify no 3rd party agents were causing issues
  • Tested with new mailboxes on new mailbox databases
  • Set all Connector & Diagnostic logging to Verbose
  • Recreated the default receive connectors
  • Enabled Pipeline Tracing for the Transport as well as the Mailbox Transport services to see the messages at each stage of Transport
  • Installed a second Exchange server in the environment & tested with new database/users
  • Recreated all of the System Mailboxes
  • Re-ran setup /PrepareAD

All tests resulted in the same behavior. In fact, I found that I could recreate the issue without even enabling OOF. If I created a mailbox rule to have the system send an email to the original sender (effectively functioning like an OOF) I would experience the same behavior. However, an OOF is really just a mailbox rule, which would explain the error we’re seeing in the queue regarding the “Mailbox Rules Agent”. So obviously something was not allowing the rules to fire & my money was on either permissions or some goofy object in Active Directory (like an invalid character in a LegacyDN or the Organization name) that was causing the Rules Agent to choke when it tried to generate the email message.

I was able to get to Microsoft Tier 3 Support (a benefit of being a Premier Partner) & they were equally perplexed. As was expected, they had me run both an ExTrace as well as an IDNA Trace. In short, each of these tools gives the Microsoft debuggers a deep look at what’s going on within the individual processes. I’ve looked at the output of these traces a few times myself but it tends to be a little too deep for a non-debugger to make use of. Needless to say, if you’ve reached this point with Microsoft Support then you’ve certainly got an interesting case.

Resolution

So what was the eventual resolution? Well it was permissions but it still left me scratching my head. First off, here’s the fix:

We opened ADSIEDIT & navigated to the Default Receive Connector of the Exchange Server, went to the Security tab, selected the ExchangeLegacyInterop security group. By default, there is a Deny entry (seen below) for the “Accept Organization Headers” permission.

2

In our case, we unchecked Deny for this permission & after restarting the MSExchange Transport service the issue went away; we then started receiving OOF messages as expected. Microsoft Support explained that this permission is used when the system needs to issue the “MAIL FROM” SMTP verb; which is required when generating OOF messages or similar Rules.

I was excited to figure out the cause of the issue but was also curious why this would be affecting us since as far as I understand, only Exchange 2000/2003 servers were ever members of the ExchangeLegacyInterop security group. This group was used to enable mail flow between the legacy servers & Exchange 07/10.

Upon inspection, this customer’s Exchange 2013 servers had been made members of this group. This security group had zero members in my Greenfield Exchange 2013 environment so I suspected the customer added them there for some odd reason in the past.

Oddly enough, the Microsoft Engineer told me they had a lab environment that was also migrated from Exchange 2003 & it also had this group populated with the 2013 servers. So after asking around to several colleagues, every answer I got back was that their ExchangeLegacyInterop security group was empty. So what does this tell us? Either both environments had someone modify this group manually at some time (for whatever strange reason), or there is a particular set of circumstances that could lead to a 2013 environment having the 2013 servers as members of this group. So if it happens to you then hopefully this article will be of some use to you.

Lastly, the thing I find most bizarre about this situation is that I was unable to reproduce this issue in my SP1 lab. After adding my three 2013 SP1 servers to the ExchangeLegacyInterop security group (forcing replication & rebooting each server to be sure) I could not reproduce the issue. However, after installing CU5 on one of the servers (and thereby updating the Active Directory Schema) the issue occurred on all 3 servers. So apparently something about the CU5 schema change triggers this to be an issue. However, everything I have seen has proven that no 2013 servers should be members of this group to begin with so I’d say it’s not a bug but more a series of unfortunate events :)

Note: MS Support told me that they had another case like this but in that situation, the Accept Organization Headers permission had been denied to the Authenticated Users group on the receive connectors. Not sure how that happened either but it just proves this permission is important for the system to have.

Legacy Public Folder remnants in Exchange 2013 cause “The Microsoft Exchange Administrator has made a change…” prompt


Background

I usually refrain from writing posts on issues where I haven’t been able to fully reproduce them in my lab but enough people seem to be having this issue that it would be good to spread the word should another person find themselves afflicted by it. I’ve seen this issue happen in two different environments & then found out via the forums that several other people have run into it as well.

Issue

I was working with a customer who migrated from Exchange 2007 to Exchange 2013. After decommissioning the 2007 servers, all the Exchange 2013 mailboxes started getting the infamous “The Microsoft Exchange Administrator has made a change that requires you quit and restart Outlook” prompt.

This seemed odd because Exchange 2013 was supposed to all but eliminate those prompts. While it did eliminate the prompts when the RPC Endpoint (Server Name field in Outlook) changed, there are still other scenarios that could result in this prompt (please see reference links at bottom of post for a detailed history). One such thing relates to the Public Folder Hierarchy.

In this customer’s scenario, I determined that the “PublicFolderDatabase” attribute on every Exchange 2013 Mailbox Database was set to a value resembling the screenshot below:

Admin

In this case, the decommissioning of Exchange 2007 & its Legacy Public Folders was not done correctly (same issue probably would have occurred if it were 2010). The Public Folder Database was showing up as a deleted object in AD. So the result was that the Outlook clients were trying to access Public Folder information but were reacting in a way that resulted in the frequent prompt to restart Outlook.

The resolution in this case was to drill down to the properties of the Mailbox Database in ADSIEDIT & set the value of “msExchHomePublicMDB” to be blank. Afterwards, a restart of the Information Store Service resolved the issue.

Additional Info

Not long after this issue, I was contacted by a Consultant I know who encountered the exact same issue. After an improperly performed Exchange 2007 migration, the Exchange 2013 mailboxes were getting prompted to restart Outlook. That environment also had Mailbox Databases that were pointed to a deleted object for their default Public Folder Database. Clearing the value & restarting the Information Store Service also resolved their issue.

After hearing this I went online to see if any others were encountering this issue. I found the below two forum posts

Reference A

Reference B

I then tried to reproduce this in my own environment but could not. Manually deleting the Exchange 2007 Server object from AD as well as manually deleting the Public Folder Database object did leave the 2013 Mailbox Databases pointing to the ghosted objects, but I did not receive the prompts. It appears there’s a particular chain of events that causes this issue but even though I could not recreate them in my lab, it certainly seems like people are running into the issue in the wild. If you start receiving these prompts then I suggest looking to make sure your attributes are not also pointed to ghosted objects.

Note: I was also informed that you could leave yourself in this scenario by incorrectly performing a migration from Legacy Public Folders to Modern Public Folders.

During the migration, you run the “Set-Mailbox <PublicFolderMailboxName> –PublicFolder –IsExcludedFromServingHierarchy:$True” command to prevent the Modern Public Folders from serving the Hierarchy requests while you’re moving data over; when you eventually complete the migration you should run “Set-Mailbox <PublicFolderMailboxName> –PublicFolder –IsExcludedFromServingHierarchy:$False” to allow it to serve the Hierarchy requests. If you do not run this command then you may receive the same prompts.

Additional References

http://blogs.msdn.com/b/aljackie/archive/2013/11/14/outlook-and-rpc-end-point-the-microsoft-exchange-administrator-has-made-a-change-that-requires-you-quit-and-restart-outlook.aspx

http://blogs.technet.com/b/exchange/archive/2011/01/24/obviating-outlook-client-restarts-after-mailbox-moves.aspx

http://blogs.technet.com/b/exchange/archive/2012/05/30/rpc-client-access-cross-site-connectivity-changes.aspx

Mails Stuck In The Draft Folder


Today, I came cross another interesting mail flow issue, where all mails stuck in Draft folders for all users when they are using OWA. You can imagine that mail flow was broken, that non of users can send any mails internally or externally.

Customer has troubleshot it for over 12 hours, and has gone as far as re-install operating system and Exchange 2013 server with /RecoverServer switch, but issue remains.

When I started looking at the issue, I went through series of basic transport troubleshooting steps for Exchange 2013 multirole server, such as checking all transport related services, possible back pressure issue, and state of all server components. Of course, there is nothing wrong with them.

Running out of ideas, I checked settings of send connector, just to make sure there is nothing out of ordinary. I see this in Send Connector properties,

Image

 

There are not many reasons for any Exchange server to use External DNS server for lookups out there. For this environment, it certainly is not needed as well.

I unchecked the box, and restart transport service to speed up the process, but issue remans.

I then run get-TransportService | fl *dns*, to make sure that we don’t have any external DNS settings configured.

   Image

  Ah ha! External DNS server setting is set. I run few tests with nslookup, the DNS server did not respond to any queries. So that’s probably the reason why that mails are not flowing.

  To remove it, you have to run Set-TransportService -ExternalDNSAdapterEnabled $true -ExternalDNSServers $null.

  After restarting the transport service, all mails in the Draft folder are gone. Mail flow is restored!

Exchange 2010 SP3 installation fails on SBS 2011


I had an interesting issue with Exchange 2010 SP3 installation on a SBS 2011 server last night. Installation fails on the Hub Transport Server Role with following errors.

sbs 2011 upgrade sp3 error

 

This made me scratching my head. Why is it trying to remove existing certificate that is used by Exchange? It’s also the default SMTP certificate, that’s why setup was not able to remove it.

After investing further, I see this line in the PowerShell script,

Write-ExchangeSetupLog -Info “Removing default Exchange Certificate”;
Get-ExchangeCertificate | where {$_.FriendlyName.ToString() -eq “Microsoft Exchange”} | Remove-ExchangeCertificate

So it’s trying to remove default Exchange certificate that was created during the initial installation, that has friendly name “Microsoft Exchange”.

I’m thinking, there is no way the Godaddy certificate has Friendly Name “Microsoft Exchange”. After looking at the certificate properties, it is indeed the problem. The Friendly Name is showing “Microsoft Exchange”, instead of mail.domain.com.

In order for us to install SP3, we have to use SBS console to import a temporary certificate, so it updates “LeafCertThumbPrint” property in this registry key,

“HKEY_LOCAL_MACHINE\Software\Microsoft\SmallBusinessServer\Networking”

 Note: you can also update the registry manually with one of thumbprint from existing certificate that is already imported.

Exchange 2010 SP3 installs fine after the cert change.  Since we didn’t export the existing GoDaddy certificate before running SP3 setup, it was removed by the setup. In order for Exchange OA and Activesync clients  to continue function,  we have issue a new certificate request with proper Friendly Name, then import the new certificate. You can also reuse the existing certificate on GoDaddy’s website by using “Re-Key” option, but you might end up with a certificate without private key. To repair the missing private key, you can run following command
   certutil –repairstore my <serial number>

 

 

All Exchange 2013 Servers become unusable with permissions errors


Overview

The title might sound a bit scary but this one was actually a pretty easy fix. It’s a lesson in not digging yourself into a deeper hole than you’re already in during troubleshooting. I wish I would’ve had this lesson 10yrs ago :)

Scenario

The customer was unable to login to OWA, EAC, or Exchange Management Shell on any Exchange 2013 SP1 server in their environment. The errors varied quite a bit, when logging into OWA they would get:

“Something went wrong…

A mailbox could not be found for NT AUTHORITY\SYSTEM.”

When trying to open EMS you would receive a wall of red text which would essentially be complaining about receiving a 500 internal server error from IIS.

In the Application logs I would see an MsExchange BackEndRehydration Event ID 3002 error stating that “NT AUTHORITY\SYSTEM does not have token serialization permission”.

Something definitely seemed to be wrong with Active Directory as this was occurring on all 3 of the customers Exchange 2013 servers; one of which was a DC (more on that later).

Resolution

So one of the 1st questions I like to ask of customers is “when was the last time this was working?” After a bit of investigation I was able to find out that the customer had recently been trying unsuccessfully to create a DAG from his 3 Exchange 2013 SP1 servers. They could get two of the nodes to join but the 3rd would not (the one that was also a DC). The customer thought it was a permissions issue so they had been “making some changes in AD” to try to resolve them. I asked if those changes were documented; the silence was my answer….. :)

However, this current issue was affecting all Exchange 2013 servers & not just the one that’s also a DC so I was a bit perplexed as to what could’ve caused this.

So a bit of time on Bing searching for Token Serialization errors brought me to MS KB2898571. The KB stated that if the Exchange Server computer account was a member of a restricted group then Token Serialization Permissions would be set to Deny for it. These Restricted Groups are:

  • Domain Admins
  • Schema Admins
  • Enterprise Admins
  • Organization Management

The KB mentioned running gpresult /scope computer /r on the Exchange servers to see if they were showing as members of any of the restricted groups (see article for further detail & screenshots of the commands). I ran this command on all 3 Exchange 2013 servers & it showed their Computer accounts were all members of the Domain Admins group. In Active Directory Users & Computers I looked at each Exchange Server Computer account (on the Member Of tab) & unfortunately there were no direct ACL assignments so I had to search the membership chain of each common group that the servers were members of. The common groups that all Exchange Server Computer accounts were members of were:

  • Domain Computers
  • Exchange Install Domain Servers
  • Exchange Servers
  • Exchange Trusted Subsystem
  • Managed Availability Servers

Eventually I found that the Exchange Install Domain Servers group had been added as a member of the Domain Admins group during the customers troubleshooting efforts to get all their servers added as DAG members. I removed the Exchange Install Domain Servers group as a member of the Domain Admins group & then rebooted all of the Exchange servers. After the reboots the issues went away & the customer was able to access OWA/EMS.

Now this is where I had to explain to the customer that it was not supported to have an Exchange Server that was also a Domain Controller as a member of a Failover Cluster/DAG. This was why they were having such a hard time adding their Exchange server/DC as a member of their DAG.

Conclusion

I have a saying that I came up with called “troubleblasting”. i.e. “John doesn’t troubleshoot, he troubleblasts!” It started out as just a cheesy joke amongst colleagues back in college but I’ve started to realize just how dangerous it can be. It’s that state you can sometimes get into when you’re desperate, past the point of documenting anything you’re doing out of frustration, & just throwing anything you can up against the wall to see what sticks & resolves your issue. Sometimes it can work out for you but sometimes it can leave you in a state where you’re worse off than when you started. Let this be a lesson to take a breath, re-state what you’re trying to accomplish, & if what you’re doing is really the right thing given the situation. In this case, an environment was brought to its knees because a bit of pre-reading on supportability was not done beforehand & a permission change adversely affected all Exchange 2013 servers.

If you can make it to Exchange Connections in Las Vegas this September, I’ll be presenting a session on “Advanced troubleshooting procedures & tools for Exchange 2013”. Hopefully I can share some tips/tools from the field that have proven useful & can keep you from resorting to the “Troubleblasting Cannon of Desperation” :)

AD Certificate Services not starting due to database in Dirty Shutdown


Background

I had a customer running SBS (Small Business Server) 2011, which runs Exchange 2010, who needed to renew their SSL Certificate as it had recently expired. I have quite a bit of experience with SBS since we have a large Support customer base running it & while it can be a pain to troubleshoot because of so many moving pieces (AD/Exchange/SharePoint/WSUS/SQL/RD Gateway all on one box) there are a few cool features. One of these features is the “Setup your Internet Address” wizard. Because SBS is also its own Certificate Authority the wizard will generate a certificate for you & assign it to Exchange/IIS/RD Gateway. It will also configure all the Exchange virtual directories for you as well as create a certificate install package for you to deploy to non-domain joined systems so your Outlook Anywhere clients will trust your CA.

 

Issue

However, when going to re-run the wizard to renew the certificate I received an error regarding the Active Directory Certificate Service not running. The System event logs had a 7024 event from “Service Control Manager” stating “The Active Directory Certificate Services service terminated with the service-specific error %%939523546”.

So we were unable to request a new certificate & the customer was hoping to avoid purchasing a third-party certificate since they had been working fine (for an extremely small shop) like this for several years.

 

Solution

After researching the error, I found that the error code given pointed to the Certificate Authority database being corrupted. So I navigated to C:\Windows\System32\Certlog & I found an old friend; an ESE (Extensible Storage Engine) database file.  If you didn’t know already, ESE isn’t just used for Exchange.

AD Certificate Services, DHCP (C:\Windows\System32\dhcp\dhcp.mdb), & Active Directory itself (C:\Windows\NTDS\ntds.dit) all use ESE databases.

The caveat however is that instead of ESEUTIL, you should use ESENTUTL to work with them.

(Additional references 1 & 2)

So I ran esentutl /mh <CA Name>.edb to view the header of the database file & found that it was in a Dirty Shutdown. I then tried to run a Recovery against the database by running Esentutl /r edb but this failed.

If this were an Exchange database then this would be where I would try to restore from a backup. Unfortunately this customer did not have a backup of their CA database file (I think a lot of customers would fall into this category) so I had to move onto running a Repair which is the dreaded “/P”.

Microsoft Support offers strict guidance around running a “/p” on Exchange (like performing a Defrag or a Mailbox Move followed by an Integrity Check/Mailbox Repair immediately after having to run a /p; Also, it should be considered a LAST resort) but no such guidance exists for Certificate Services since it is a much MUCH simpler database structure. But a ‘/P” is almost always a destructive action, with associated data loss, so if you have a backup you should always pursue that option first

I ran esentutl /p <CA Name>.edb & after it completed I was then able to start the Active Directory Certificate Services Service. All the proper data (including Issued Certificates & Templates) were still there & after re-running the SBS “Setup your Internet Address” wizard the customer now had a renewed certificate.

 

 

 

Bad NIC Settings Cause Internal Messages to Queue with 451 4.4.0 DNS query failed (nonexistent domain)


Overview:

I’ve come across this with customers a few times now & it can be a real head scratcher. However, the resolution is actually pretty simple.

 

Scenario:

Customer has multiple Exchange servers in the environment, or has just installed a 2nd Exchange server into the environment. Customer is able to send directly out & receive in from the internet just fine but is unable to send email to/through another internal Exchange server.

This issue may also manifest itself as intermittent delays in sending between internal Exchange servers.

In either scenario, messages will be seen queuing & if you run a “Get-Queue –Identity QueueID | Formal-List” you will see a “LastError” of “451 4.4.0 DNS query failed. The error was: SMTPSEND.DNS.NonExistentDomain; nonexistent domain”.

 

Resolution:

This issue can occur because the Properties of the Exchange Server’s NIC have an external DNS server listed in them. Removing the external DNS server/servers & leaving only internal (Microsoft DNS/Active Directory Domain Controllers in most customer environments) DNS Servers; followed by restarting the Microsoft Exchange Transport Service should resolve the issue.

 

Summary:

The Default Configuration of an Exchange Server is to use the local Network Adapter’s DNS settings for Transport Service lookups.

(FYI: You can alter this in Exchange 07/10 via EMS using the Set-TransportServer command or in EMC>Server Configuration>Hub Transport>Properties of Server. Or in Exchange 2013 via EMS using the Set-TransportService command or via EAC>Servers>Edit Server>DNS Lookups. Using any of these methods, you can have Exchange use a specific DNS Server.)

Because the default behavior is to use the local network adapter’s DNS settings, Exchange was finding itself using external DNS servers for name resolution. Now this seemed to work fine when it had to resolve external domains/recipients but a public DNS server would likely have no idea what your internal Exchange servers (i.e. Ex10.contoso.local) resolve to.The error we see is due to the DNS server responding, but it just not having the A record for the internal host that we require. If the DNS server you had configured didn’t exist or wasn’t reachable you would actually see slightly different behavior (like messages sitting in “Ready” status in their respective queues).

 

An Exchange server, or any Domain-joined server for that matter, should not have its NICs DNS settings set to an external/ISPs DNS server (even as secondary). Instead, they should be set to internal DNS servers which have all the necessary records to discover internal Exchange servers.

 

References

http://support.microsoft.com/kb/825036

http://technet.microsoft.com/en-us/library/bb124896(v=EXCHG.80).aspx

“The DNS server address that is configured on the IP properties should be the DNS server that is used to register Active Directory records.”

http://technet.microsoft.com/en-us/library/aa997166(v=exchg.80).aspx

http://exchangeserverpro.com/exchange-2013-manually-configure-dns-lookups/

http://thoughtsofanidlemind.com/2013/03/25/exchange-2013-dns-stuck-messages/