Customer had a single server Exchange 2013 SP1 environment that had been migrated from Exchange 2003 over time (double hop migration). The customer installed CU5 & about a month later they noticed Out of Office messages were not being sent for users who had it enabled. The environment had only been running a short while before updating to CU5 so at the time the customer didn’t correlate the issue happening with updating to CU5 (more on this later).
In this scenario, Mailbox-A would enable their Out of Office, Mailbox-B would send Mailbox-A an email, Mailbox-A would receive the email message but Mailbox-B would never receive an OOF message.
In addition to this, if you looked in the queues you would see the original email message (not the OOF message) queued even though it had already been successfully delivered.
The LastError on this message stated “432 4.2.0 STOREDRV.Deliver; Agent transient failure during message resubmission [Agent: Mailbox Rules Agent]” To make matters worse, this message would eventually timeout & the original sender (Mailbox-B in my above scenario) would receive an NDR containing “#< #4.4.7 smtp;550 4.4.7 QUEUE.Expired; message expired> #SMTP#” which would lead them to the false belief that the original message was never delivered. If Mailbox-A turned off their OOF then the message in the queue would eventually be removed without an NDR.
This situation not only prevented the customer from utilizing Out Of Office as it was intended to be used, but it also caused NDRs to be generated to anyone sending them emails while OOF was enabled.
This was a really perplexing case as I threw just about everything I had in my bag of tricks at it. Before relenting & eventually escalating to Microsoft I performed the following:
- Using Get-TransportAgent to verify no 3rd party agents were causing issues
- Tested with new mailboxes on new mailbox databases
- Set all Connector & Diagnostic logging to Verbose
- Recreated the default receive connectors
- Enabled Pipeline Tracing for the Transport as well as the Mailbox Transport services to see the messages at each stage of Transport
- Installed a second Exchange server in the environment & tested with new database/users
- Recreated all of the System Mailboxes
- Re-ran setup /PrepareAD
All tests resulted in the same behavior. In fact, I found that I could recreate the issue without even enabling OOF. If I created a mailbox rule to have the system send an email to the original sender (effectively functioning like an OOF) I would experience the same behavior. However, an OOF is really just a mailbox rule, which would explain the error we’re seeing in the queue regarding the “Mailbox Rules Agent”. So obviously something was not allowing the rules to fire & my money was on either permissions or some goofy object in Active Directory (like an invalid character in a LegacyDN or the Organization name) that was causing the Rules Agent to choke when it tried to generate the email message.
I was able to get to Microsoft Tier 3 Support (a benefit of being a Premier Partner) & they were equally perplexed. As was expected, they had me run both an ExTrace as well as an IDNA Trace. In short, each of these tools gives the Microsoft debuggers a deep look at what’s going on within the individual processes. I’ve looked at the output of these traces a few times myself but it tends to be a little too deep for a non-debugger to make use of. Needless to say, if you’ve reached this point with Microsoft Support then you’ve certainly got an interesting case.
So what was the eventual resolution? Well it was permissions but it still left me scratching my head. First off, here’s the fix:
We opened ADSIEDIT & navigated to the Default Receive Connector of the Exchange Server, went to the Security tab, selected the ExchangeLegacyInterop security group. By default, there is a Deny entry (seen below) for the “Accept Organization Headers” permission.
In our case, we unchecked Deny for this permission & after restarting the MSExchange Transport service the issue went away; we then started receiving OOF messages as expected. Microsoft Support explained that this permission is used when the system needs to issue the “MAIL FROM” SMTP verb; which is required when generating OOF messages or similar Rules.
I was excited to figure out the cause of the issue but was also curious why this would be affecting us since as far as I understand, only Exchange 2000/2003 servers were ever members of the ExchangeLegacyInterop security group. This group was used to enable mail flow between the legacy servers & Exchange 07/10.
Upon inspection, this customer’s Exchange 2013 servers had been made members of this group. This security group had zero members in my Greenfield Exchange 2013 environment so I suspected the customer added them there for some odd reason in the past.
Oddly enough, the Microsoft Engineer told me they had a lab environment that was also migrated from Exchange 2003 & it also had this group populated with the 2013 servers. So after asking around to several colleagues, every answer I got back was that their ExchangeLegacyInterop security group was empty. So what does this tell us? Either both environments had someone modify this group manually at some time (for whatever strange reason), or there is a particular set of circumstances that could lead to a 2013 environment having the 2013 servers as members of this group. So if it happens to you then hopefully this article will be of some use to you.
Lastly, the thing I find most bizarre about this situation is that I was unable to reproduce this issue in my SP1 lab. After adding my three 2013 SP1 servers to the ExchangeLegacyInterop security group (forcing replication & rebooting each server to be sure) I could not reproduce the issue. However, after installing CU5 on one of the servers (and thereby updating the Active Directory Schema) the issue occurred on all 3 servers. So apparently something about the CU5 schema change triggers this to be an issue. However, everything I have seen has proven that no 2013 servers should be members of this group to begin with so I’d say it’s not a bug but more a series of unfortunate events :)
Note: MS Support told me that they had another case like this but in that situation, the Accept Organization Headers permission had been denied to the Authenticated Users group on the receive connectors. Not sure how that happened either but it just proves this permission is important for the system to have.