Bad NIC Settings Cause Internal Messages to Queue with 451 4.4.0 DNS query failed (nonexistent domain)


Overview:

I’ve come across this with customers a few times now & it can be a real head scratcher. However, the resolution is actually pretty simple.

 

Scenario:

Customer has multiple Exchange servers in the environment, or has just installed a 2nd Exchange server into the environment. Customer is able to send directly out & receive in from the internet just fine but is unable to send email to/through another internal Exchange server.

This issue may also manifest itself as intermittent delays in sending between internal Exchange servers.

In either scenario, messages will be seen queuing & if you run a “Get-Queue –Identity QueueID | Formal-List” you will see a “LastError” of “451 4.4.0 DNS query failed. The error was: SMTPSEND.DNS.NonExistentDomain; nonexistent domain”.

 

Resolution:

This issue can occur because the Properties of the Exchange Server’s NIC have an external DNS server listed in them. Removing the external DNS server/servers & leaving only internal (Microsoft DNS/Active Directory Domain Controllers in most customer environments) DNS Servers; followed by restarting the Microsoft Exchange Transport Service should resolve the issue.

 

Summary:

The Default Configuration of an Exchange Server is to use the local Network Adapter’s DNS settings for Transport Service lookups.

(FYI: You can alter this in Exchange 07/10 via EMS using the Set-TransportServer command or in EMC>Server Configuration>Hub Transport>Properties of Server. Or in Exchange 2013 via EMS using the Set-TransportService command or via EAC>Servers>Edit Server>DNS Lookups. Using any of these methods, you can have Exchange use a specific DNS Server.)

Because the default behavior is to use the local network adapter’s DNS settings, Exchange was finding itself using external DNS servers for name resolution. Now this seemed to work fine when it had to resolve external domains/recipients but a public DNS server would likely have no idea what your internal Exchange servers (i.e. Ex10.contoso.local) resolve to.The error we see is due to the DNS server responding, but it just not having the A record for the internal host that we require. If the DNS server you had configured didn’t exist or wasn’t reachable you would actually see slightly different behavior (like messages sitting in “Ready” status in their respective queues).

 

An Exchange server, or any Domain-joined server for that matter, should not have its NICs DNS settings set to an external/ISPs DNS server (even as secondary). Instead, they should be set to internal DNS servers which have all the necessary records to discover internal Exchange servers.

 

References

http://support.microsoft.com/kb/825036

http://technet.microsoft.com/en-us/library/bb124896(v=EXCHG.80).aspx

“The DNS server address that is configured on the IP properties should be the DNS server that is used to register Active Directory records.”

http://technet.microsoft.com/en-us/library/aa997166(v=exchg.80).aspx

http://exchangeserverpro.com/exchange-2013-manually-configure-dns-lookups/

http://thoughtsofanidlemind.com/2013/03/25/exchange-2013-dns-stuck-messages/

 

Unable to logon to O365 via ADFS – ADFSAppPool stops (aka. I had a bad day)


Environment:
Customer using Exchange Online/Office 365 with no Exchange servers on-prem. Two ADFS 2.0 servers running on Server 2008 R2, enabling them to logon to Exchange Online via SSO (Single Sign On).

Issue:
After rebooting the two ADFS servers post Windows Updates the customer could no longer login to OWA & would receive a “503 Service Unavailable” error message via IIS on the two ADFS servers.

Background:
I have to hang my head in shame with this one as I really should have figured this out sooner. Initial troubleshooting showed that the ADFSAppPool was stopped in IIS. It would start but as soon as you tried accessing it, it would stop again. Nothing at all in the Application or ADFS logs in Event Viewer (more on this poor bit of troubleshooting on my part later).The ADFS service account it was running under looked ok; the App Pool would start & so would the ADFS Service (both running under this account) so it seemed to not be a credential issue (at least I got that part right). I even went as far as to reinstall ADFS & IIS on the non-primary ADFS server in the event it was something in IIS. I was clearly out-classed on this seemingly simple issue.

Resolution:
Because the customer was down & I was scratching my head, I decided to escalate the issue to Microsoft; at which point they resolved the issue in about 5min.

Now before I say the fix I’d just like to say I consider myself a good troubleshooter. I’ve been troubleshooting all manner of Microsoft, Cisco, etc technologies for more than a decade & made a pretty successful career out of it. I even managed to pass both the MCM 2010 & MCSM 2013 lab exams on the 1st attempt; but today was not my day. I spent over 2 hrs on this & I broke the cardinal rule of troubleshooting; I overlooked the simple things. Like many of us do I started digging a hole of deep troubleshooting, expecting this to be an incredibly complex issue; I was looking at SPN’s, SQL Permissions, checking settings in Azure, etc. I should have just looked back up in the sky instead of trying to dig a hole a mile deep but only 3 ft wide, because for some idiotic reason I chose to overlook the System Event logs….

I suppose once I saw nothing in the Application or ADFS logs I just moved on quickly to the next possibility but in a few short minutes the Microsoft Engineer checked the System Logs & saw Event 5021 from IIS stating that the service account did not have Batch Logon Rights (more on the event here). This lead him to look at Group Policy settings & sure enough, there was a GPO allowing only the Domain Admins group to log on as a batch job. (Reference 1 & 2). It seems this setting took effect after the ADFS servers were rebooted post Windows Updates. Not sure how the GPO got there as this solution was working for 2 years beforehand but it certainly was ruining our day today. After the GPO was modified to allow the ADFS service account to log on as a batch job, the issue was resolved after some service restarts.

Moral of the story:
Never overlook the obvious!
 It’s the best advice I can give to anyone, anywhere, & who has to troubleshoot anything. I’d like to say this is the 1st time this has happened to me but it’s not. Overlooking typos, not checking to see if a network cable is plugged in, not checking to see if a service is started… It happens to the best of us. I suppose overlooking the simple solution is just part of the human condition…..or at least whatever condition I have….. :)

Exchange 2013 SP1 Breaks Hub Transport service


I had an issue last night that woke me up at 2 am in the morning by the On Call phone. I feel that we might see this often when Exchange admins start to applying Exchange server 2013 SP1.

After installing Exchange 2013 sp1, MSExchange Transport service hangs at “Starting”, then eventually crashes with couple of event ID’s.

“Event ID 1046, MSExchange TrasnportService

Worker process with process ID 17836 requested the service to terminate with an unhandled exception. “

“Event ID 4999, MSExchange Common

Watson report about to be sent for process id: 2984, with parameters: E12IIS, c-RTL-AMD64, 15.00.0847.032, MSExchangeTransport, M.Exchange.Net, M.E.P.WorkerProcessManager.HandleWorkerExited, M.E.ProcessManager.WorkerProcessRequestedAbnormalTerminationException, 5e2b, 15.00.0847.030. “

Only way to get Transport service to start, is to disable all receive connectors and reboot the server. Does it sound familiar? My colleague Andrew Higginbotham  wrote this article few weeks ago. Although it was a different issue, but custom receive connectors on a multirole server is the key.

In my case, this is also a multirole server(CAS and Mailbox on one box).  Hub Transport service listens on TCP port 2525, and Frontend transport listens on TCP port 25.  There are two custom receive connectors that were created with Hub Transport role. Both are listening on TCP port 25. I’m not sure why they haven’t had external mail flow issue by now, but it sure knows how to get your attention by breaking the transport service.

If we disable both custom receive connectors, transport service starts fine. So we went ahead and changed transport role from Hub Transport to Frontend Transport on both connectors with Set-ReceiveConnector powershell cmdlet, then re-enable them to test. Hub Transport service stays up without issue. Of course, we also rebooted the server to make sure that issue is fixed.

Going to be in Austin TX for MEC 2014? Here’s some cool stuff to do while you’re here!


If you plan on attending MEC (Microsoft Exchange Conference) 2014 in Austin TX March 31st-April 2nd, you won’t be hard pressed to find cool things to do while you’re here. Austin’s a great city, in which all of us who contribute to this blog live. For that reason I decided to make this document containing a list of places to visit, eat, drink, & in general have a good time. Jedi, Ron, & myself will all be attending MEC as well as hanging around the downtown area throughout the week so please feel free to say hello if you see us. Also, if you haven’t booked your conference pass for MEC yet, I highly recommend it as it’ll be a great educational/networking experience; just like MEC in Orlando was.

List of Cool Austin Activities
MEC Activities

Since MEC registration went live on Oct 1st & since I’ve lived in Austin for a while, I thought I’d share some things to do while you all are in town.  I know many people will either be coming early or staying late since they have a long way to travel & they might be looking for additional activities around the Austin area. Feel free to send this along to your friends/colleagues. I imagine it’s better to make your plans now since we have plenty of time. 

Incorrectly Adding New Receive Connector Breaks Exchange 2013 Transport


I feel the concepts surrounding this issue have been mentioned already via other sources (1 2) but I’ve seen at least 5 recent cases where our customers were being adversely impacted by this issue; so it’s worth describing in detail.

Summary:

After creating new Receive Connectors on Multi-Role Exchange 2013 Servers, customers may encounter mail flow/transport issues within a few hours/days. Symptoms such as:

  • Sporadic inability to connect to the server over port 25
  • Mail stuck in the Transport Queue both on the 2013 servers in question but also on other SMTP servers trying to send to/through it
  • NDR’s being generated due to delayed or failed messages

This happens because the Receive Connector was incorrectly created (which is very easy to do), resulting in two services both trying to listen on port 25 (the Microsoft Exchange FrontEnd Transport Service & the Microsoft Exchange Transport Service). The resolution to this issue is to ensure that you specify the proper “TransportRole” value when creating the Receive Connector either via EAC or Shell. You can also edit the Receive Connector after the fact using Set-ReceiveConnector.

Detailed Description:

Historically, Exchange Servers listen on & send via port 25 for SMTP traffic as it’s the industry standard. However, you can listen/send on any port you choose as long as the parties on each end of the transmission agree upon it.

Exchange 2013 brought a new Transport Architecture & without going into a deep dive, the Client Access Server (CAS) role runs the Microsoft Exchange FrontEnd Transport Service which listens/sends on port 25 for SMTP traffic. The Mailbox Server role has the Microsoft Exchange Transport Service which is similar to the Transport Service in previous versions of Exchange & also listens on port 25. There are two other Transport Services (MSExchange Mailbox Delivery & Mailbox Submission) but they aren’t relevant to this discussion.

So what happens when both of these services reside on the same server (like when deploying Multi-Role; which is my recommendation)? In this scenario, the Microsoft Exchange FrontEnd Transport Service listens on port 25, since it is meant to handle inbound/outbound connections with public SMTP servers (which expect to use port 25). Meanwhile, the Microsoft Exchange Transport Service listens on port 2525. Because this service is used for intra-org communications, all other Exchange 2013 servers in the Organization know to send using 2525 (however, 07/10 servers still use port 25 to send to multi-role 2013 servers, which is why Exchange Server Authentication is enabled by default on your default FrontEndTransport Receive Connectors on a Multi-Role box; in case you were wondering).

So when you create a new Receive Connector on a Multi-Role Server, how do you specify which service will handle it? You do so by using the -TransportRole switch via the Shell or by selecting either “Hub Transport” or “FrontEnd Transport” under “Role” when creating the Receive Connector in the EAC.

The problem is there’s nothing keeping you from creating a Receive Connector of Role “Hub Transport” (which it defaults to) that listens on port 25 on a Multi-Role box. What you then have is two different services trying to listen on port 25. This actually works temporarily, due to some .NET magic that I’m not savvy enough to understand, but regardless, eventually it will cause issues. Let’s go through a demo.

Demo:

Here’s the output of Netstat on a 2013 Multi-Role box with default settings. You’ll see MSExchangeFrontEndTransport.exe is listening on port 25 & EdgeTransport.exe is listening on 2525. These processes correspond to the Microsoft Exchange FrontEnd Transport & Microsoft Exchange Transport Services respectively.

1new

Now let’s create a custom Receive Connector, as if we needed it to allow a network device to Anonymously Relay through Exchange (the most common scenario where I’ve seen this issue arise). Notice in the first screenshot, you’ll see the option to specify which Role should handle this Receive Connector. Also notice how Hub Transport is selected by default, as is port 25.

3

4

5

After adding this Receive Connector, see how the output of Netstat differs. We now have two different processes listening on the same port (25).

6

So there’s a simple fix to this. Just use Shell (there’s no GUI option to edit the setting after it’s been created) to modify the existing Receive Connector to be handled by the MSExchange FrontEndTransport Service instead of the MSExchange Transport Service. Use the following command:

Set-ReceiveConnector Test-Relay –TransportRole FrontEndTransport

7

I recommend you restart both Transport Services afterwards.

Outlook 2013 Security Update breaks Out of Office for Exchange 2007 Mailboxes


Issue

In the past two weeks I’ve had two customer environments come to me with the same issue. Random Outlook 2013 clients are unable to configure their Out Of Office settings. Specifically, they would get an error message when trying to open their OOF settings to set them. Some users had no issue while others got the error. Both environments were Exchange 2007 on latest Service Pack/Update Rollup & had a few users running Outlook 2013; only a few of which were having this issue.

Resolution

After some searching online I found this thread pointing to a recent (November 12th 2013) Outlook Security Update as the culprit. In my case, after uninstalling KB2837618, rebooting the client, & re-creating the Outlook Profile (a profile repair did not work) then Out Of Office started working again.

In fact, after reading the full KB it appears this is a known issue:

  • If you are using Outlook to connect to a Microsoft Exchange Server 2007 mailbox:
    • You receive an error message that resembles either of the following when you try to configure Automatic Replies (Out of Office): 
      Your automatic reply settings cannot be displayed because the server is currently unavailable. Try again later. 
    • You cannot retrieve Free/Busy data for calendar scheduling.
    • Add-ins that use the Account.SmtpAddress property no longer work.

    These features rely on an underlying Autodiscover technology. After you install this security update, Autodiscover may fail for Exchange 2007 configurations. Therefore, Outlook features that rely on Autodiscover will also stop working.

    Microsoft is researching this problem and will post more information in this article when it becomes available.

It appears the reason only random Outlook 2013 clients were experiencing the issue was because not all of them were up to date via Microsoft Update. Of course most Exchange folks will never read all the release notes of KBs pushed to Outlook via Microsoft Update ahead of time. I suppose the answer to an actual admin (which I am not) who complains will be “What, you don’t use WSUS & test every update for Office?” :D

Update:

As Jim Morris pointed out in the comments below, the issue is addressed in http://support.microsoft.com/kb/2850061

Connecting to Lync Online


Issue when connecting to Lync Online you may receive one of the the following errors

“The ‘New-CsOnlineSession’ command was found in the module ‘LyncOnlineConnector’, but
the module could not be loaded. For more information, run ‘Import-Module LyncOnlineConnector’.”

and

“Unable to discover PowerShell endpoing URI
At C:\Program Files\Common Files\Microsoft Lync Server
2013\Modules\LyncOnlineConnector\LyncOnlineConnectorStartup.psm1”

 

Lets deal with the first one,:

First ensure you have downloaded and installed the Powershell Module for Lync Online

Next, and this is the weird part set the powershell execution policy to “unrestricted”, is seems there is an issue with the modules loading, we were able to discover this by comparing a powershell session for one system that worked and other that did not. (if you are security conscious you may want to set this back when you are done)
The specific command is set-ExecutionPolicy unrestricted

For the last error ensure you have configured your Lyncdiscover records in your External DNS, this is needed for powershell to detect and connect to your online Lync environment, if you are in a hybrid configuration you may be pointing this to on-prem Lync and not Lync online. you can find your specific configuration by clicking on your domain in the office 365 admin center.

Finally the syntax to connect to your Lync Online is:esco over anything in particular let

$LiveCred = Get-Credential
$LyncSession = New-CsOnlineSession -Credential $livecred
Import-PSSession $LyncSession