Failures when proxying HTTP requests from Exchange 2013 to a previous Exchange version


Overview

I’ve seen this issue a few times over the past months & most recently this past week with a customer. Luckily there’s a fairly simple fix to the issue published by Microsoft, but realizing not everyone remembers every Microsoft KB that gets released I thought I’d shine a spotlight on this one.

Scenario

As part of the migration process, when customers move their namespace from either Exchange 2007 or 2010 to 2013, HTTP connections start proxying through 2013 to the legacy Exchange Servers and some users will experience failures. The potential affected workloads are:
AutoDiscover
Exchange Web Services (Free/Busy)
ActiveSync
OWA
Outlook

Test or new mailboxes may not be affected.

Resolution

The cause of this is the age old problem of Token Bloat. Users being members of too many groups or having large tokens.

The fix is to implement the changes in the below Microsoft KB article

“HTTP 400 Bad Request” error when proxying HTTP requests from Exchange Server 2013 to a previous version of Exchange Server
https://support.microsoft.com/en-us/kb/2988444

The interesting thing in this scenario is that the issue was not experienced in the legacy version of Exchange & even if you look at the tokens themselves, they may not seem overly large. It seems that the process of proxying Exchange traffic is much more sensitive to this issue. Also, in a recent case that went to Microsoft, even if you increase the recommended values to a value higher than your current headers it may not have the desired effect. In our case we had to set the MaxRequestBytes & MaxFieldLength values to exactly match the values in the Microsoft KB (65536 (Decimal)).

For further reading, please see the below articles.

Complimentary Articles

“HTTP 400 – Bad Request (Request Header too long)” error in Internet Information Services (IIS)
https://support.microsoft.com/en-us/kb/2020943

How to use Group Policy to add the MaxTokenSize registry entry to multiple computers
https://support.microsoft.com/en-us/kb/938118

 

Additional Note

As an FYI, another issue I commonly see when namespaces get transitioned to 2013 is authentication popups when connections proxy to the legacy Exchange Servers. Please see the below KB for that issue

Outlook Anywhere users prompted for credentials when they try to connect to Exchange Server 2013
https://support.microsoft.com/en-us/kb/2990117

I also blogged about it here
https://exchangemaster.wordpress.com/2014/10/30/exchange-2010-outlook-anywhere-users-receiving-prompts-when-proxied-through-exchange-2013/

Advertisements

Remember the basics when working with Dynamic Distribution Groups (I didn’t)


Overview:

I recently had a customer come to me with a simple issue of mail not being received in his Exchange 2013 environment when sending to a Dynamic Distribution Group he had just created. Well it certainly seemed like an easy issue to track down (which it technically was) but unfortunately I was a little too confident in my abilities & made the age-old mistake of overlooking the basics. Hopefully others can avoid that mistake after giving this a read.

Scenario:

Create a Dynamic Distribution Group named TestDL#1 whose membership is defined by a Universal Security Group named TestSecurityGroup using the following command in shell:

New-DynamicDistributionGroup -Name “TestDL#1” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”}

Note: This command places the Dynamic DL object into the default Users OU & also sets the msExchDynamicDLBaseDN to the Users OU’s Distibguished Name (CN=Users,DC=ASH,DC=NET). This will become important later.

I can verify the membership of this group by running:

$var = Get-DynamicDistributionGroup “TestDL#1”

Get-Recipient -RecipientPreviewFilter $var.RecipientFilter

In my case, the members show up correctly as John, Bob, Sam, & Dave. However, if I send emails to this group nobody gets them. When looking at messagetracking, the recipients show as {} (see below screenshot)

1

Now here’s the really interesting part. My security group, as well as my users are in the OU=End_Users,OU=Company_Users,DC=ASH,DC=NET Organizational Unit. However (as mentioned before in my Note), my Dynamic DL is in the CN=Users,DC=ASH,DC=NET Organizational Unit. Now if I move my users into the Users OU, then they receive the email & show up as valid recipients.

2

Now no matter which OU I move my Dynamic Distribution Group (TestDL#1) to, this behavior is the same.

For instance, if I had run the below command instead, I never would have noticed an issue because the Dynamic DL would’ve been created in the same OU as the users & the Security Group.

New-DynamicDistributionGroup -Name “TestDL#1” -OrganizationalUnit “ash.net/Company_Users/End_Users” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”}

The last head scratcher is if I move the actual AD Security Group (TestSecurityGroup) that I’m using to filter against to a different OU, I get the same behavior (no emails).

So it would seem that the solution is to ensure you always place the Dynamic Distribution Group into the same OU where ALL of your Security Group members are as well as the security group itself is.

This seemed crazy so I had to assume I wasn’t creating the filter correctly. It was at this point I pinged some colleagues of mine to see where I was going wrong.

Tip: Always get your buddies to peer review your work. A second set of eyes on an issue usually goes a long way to figuring things out.

Solution:

As it turned out, there were two things I failed to understand about this issue.

  1. When you create a Dynamic Distribution Group, by default, the RecipientContainer setting for that group is set to the OU where the DDG is placed. This means that because I initially did not specify the OU for the DDG to be placed in, it was placed in the Users OU (CN=Users,DC=ASH,DC=NET). So when Exchange was performing its query to determine membership, it could only see members that were in the Users OU. So the solution in my scenario would be to use the –RecipientContainer parameter when creating the OU & specify the entire domain.

EX: New-DynamicDistributionGroup -Name “TestDL#1” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”} –RecipientContainer “ASH.NET”

This one was particularly embarrassing because the answer was clearly in the TechNet article for the New-DynamicDistributionGroup cmdlet.

  1. The other thing I didn’t realize was the reason my DDG broke when moving the Security Group I was filtering against. It was breaking because I specified the Security Group using its Distinguished Name, which included the OU it resided in (CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET). So by moving the group I was making my query come up empty. Now the first thing I thought of was if I could specify the group using the common name or the GUID instead. Unfortunately, you cannot because of an AD limitation:

“MemberOfGroup filtering requires that you supply the full AD distinguished name of the group you’re trying to filter against. This is an AD limitation, and it happens because you’re really filtering this calculated back-link property from AD, not the simple concept of “memberOf” that we expose in Exchange.”

So the important thing to remember here is to either not move the Security Group you’re filtering against, or if you move it, to update your filter.

Thanks go to MVPs Tony Redmond & Tony Murray for pointing these two important facts out to me.

Conclusion:

As I found out, a strong foundational knowledge of Active Directory is key to being a strong Exchange Admin/Consultant/Support Engineer. But even when you feel confident in your abilities for a given topic, don’t be afraid to ask people you trust. You might find out you’re either a bit rusty or not as knowledgeable as you thought you were J

Bad NIC Settings Cause Internal Messages to Queue with 451 4.4.0 DNS query failed (nonexistent domain)


Overview:

I’ve come across this with customers a few times now & it can be a real head scratcher. However, the resolution is actually pretty simple.

 

Scenario:

Customer has multiple Exchange servers in the environment, or has just installed a 2nd Exchange server into the environment. Customer is able to send directly out & receive in from the internet just fine but is unable to send email to/through another internal Exchange server.

This issue may also manifest itself as intermittent delays in sending between internal Exchange servers.

In either scenario, messages will be seen queuing & if you run a “Get-Queue –Identity QueueID | Formal-List” you will see a “LastError” of “451 4.4.0 DNS query failed. The error was: SMTPSEND.DNS.NonExistentDomain; nonexistent domain”.

 

Resolution:

This issue can occur because the Properties of the Exchange Server’s NIC have an external DNS server listed in them. Removing the external DNS server/servers & leaving only internal (Microsoft DNS/Active Directory Domain Controllers in most customer environments) DNS Servers; followed by restarting the Microsoft Exchange Transport Service should resolve the issue.

 

Summary:

The Default Configuration of an Exchange Server is to use the local Network Adapter’s DNS settings for Transport Service lookups.

(FYI: You can alter this in Exchange 07/10 via EMS using the Set-TransportServer command or in EMC>Server Configuration>Hub Transport>Properties of Server. Or in Exchange 2013 via EMS using the Set-TransportService command or via EAC>Servers>Edit Server>DNS Lookups. Using any of these methods, you can have Exchange use a specific DNS Server.)

Because the default behavior is to use the local network adapter’s DNS settings, Exchange was finding itself using external DNS servers for name resolution. Now this seemed to work fine when it had to resolve external domains/recipients but a public DNS server would likely have no idea what your internal Exchange servers (i.e. Ex10.contoso.local) resolve to.The error we see is due to the DNS server responding, but it just not having the A record for the internal host that we require. If the DNS server you had configured didn’t exist or wasn’t reachable you would actually see slightly different behavior (like messages sitting in “Ready” status in their respective queues).

 

An Exchange server, or any Domain-joined server for that matter, should not have its NICs DNS settings set to an external/ISPs DNS server (even as secondary). Instead, they should be set to internal DNS servers which have all the necessary records to discover internal Exchange servers.

 

References

http://support.microsoft.com/kb/825036

http://technet.microsoft.com/en-us/library/bb124896(v=EXCHG.80).aspx

“The DNS server address that is configured on the IP properties should be the DNS server that is used to register Active Directory records.”

http://technet.microsoft.com/en-us/library/aa997166(v=exchg.80).aspx

http://exchangeserverpro.com/exchange-2013-manually-configure-dns-lookups/

http://thoughtsofanidlemind.com/2013/03/25/exchange-2013-dns-stuck-messages/

 

Common Support Issues with Transport Agents


This is a fairly basic post but it happens enough that I’d like to call out the basics of troubleshooting it. I’ve seen many cases over time where mail flow is either being halted or become sluggish due to a third-party transport agent (I actually saw 3 instances of this happening this past month which prompted this post).

Examples of Transport Agents could be Anti-Virus software, Anti-Spam software, DLP software, agents which add disclaimers to email messages, or email archiving solutions. I won’t call out specific vendors as I don’t think there’s necessarily anything wrong with any particular one. Sometimes an install of a piece of software just becomes corrupted or there’s some unforeseen incompatibility between the third-party software & Exchange; or some other software in the environment. However, sometimes the Agent can indeed have a bug which needs to be addressed with the vendor.

Anyways, here’s the ways in which I’ve seen these issues manifest themselves:

  • Messages Stuck in the Submission queue
  • A delay in SMTP response (when you telnet to the Exchange Server over 25, it takes longer than expected for the server’s SMTP banner to be displayed)
  • Messages are slow to flow through the transport pipeline (general slow delivery)
  • Microsoft Exchange Transport Service will not start or repeatedly crashes

To highlight more recent examples, last week I had a colleague come to me saying he had two Exchange 2010 Hub/CAS boxes, with the same config, yet one of them would have a slower connection when he would telnet to it; the banner would take at least 20 seconds to be displayed. This also resulted in the health checks for the hardware load balancer in place to mark the server as down. Each server had the same Anti-V/Anti-SPAM software installed, yet only one was showing the symptoms. For testing purposes he “disabled” the third-party software using its management interface but the issue persisted.

However, after running a “Get-TransportAgent” on the server, the Transport Agent still showed as being “Enabled”. This demonstrates a point I frequently make with customers, that disabling Anti-Virus software rarely serves as a useful troubleshooting step (even file-based Anti-V). This is because the TransportAgent is typically still enabled. For file-based Anti-Virus, even with the Services disabled there is usually still a network filter driver that is sitting on the TCP/IP stack which could be causing issues (only an uninstall of the 3rd-party product removes it).

Bottom-line, an uninstall is still the best method to remove potentially problematic Anti-V/Anti-SPAM/Anti-Malware software. So in this case the issue was a bad/corrupted install of the product on that server.

Another scenario (also Exchange 2010) was where messages were stuck in the Submission Queue for extended periods of time. The Application Logs were filled with Event 1050 MSExchange Extensibility events which were stating the installed agent was taking an unusual amount of time to process an event; thus causing the delay in transport (Reference 1 2 3).

After running Get-TransportAgent I was actually greeted by an error message saying it was unable to access a file located in the “C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\agents” directory. This is where the files associated with your Transport Agents are stored. So again, the issue was a corrupted install of the product. Reinstalling the software resolved the issue.

So nothing fancy about this one. Just check Event Viewer for Transport events or use process of elimination if you’re experiencing any of the symptoms above. Having worked with Microsoft Support many times in the past, they will almost always ask you to remove third-party components such as Anti-V if they are unable to pinpoint the issue to its source; so save yourself some time & rule it out first.

I know some people work for companies where this is like pulling teeth but it’s always going to be a battle between usability & security. If your management requires you spend 40 hours on the phone working with a vendor or Microsoft before finally being told you’re going no further until removing the third-party component then I give you my best & suggest you get the coffee started. We all know the most important acronym in IT is CYA after all 😉

For great reading on Exchange Transport Agents see MCM/MCSM/MVP Brian Reid’s two posts on the topic

Creating a Simple Exchange Server Transport Agent

Exchange 2013 Transport Agents

ActiveSync Synching Folders but not Mail


Issue

One of our smaller customers running Exchange 2010 SP3 UR2 was having an issue with one particular mailbox being unable to download mail items via ActiveSync on any device. The odd thing was that the folder structure would come down but no mail items would be synched. The customer said it was working fine until about a week previously.

Troubleshooting

Looking through Event Viewer in the Application logs led me to the following events from “MSExchangeIS Mailbox Store”:
10030

A mismatch was detected between a view of a folder and the actual contents of the folder. The mismatched item was ignored.

Attempts may be made to rebuild the view, but if this message continues to persist for this mailbox, moving the mailbox to a different database may resolve the issue.

Database: Mailbox Database

Folder: [MBX:John Smith][AllItems]

MsgHeader ID: 1110-1E6B08

Folder ID: 1110-3DA14B

View ID: 1110-3DA582

View Name: 1110-3DA14B +A-D-T301c

Document ID: 294529

Function: EcPopulateInitialMsgViewTable(Search)

Followed by:

10031

A folder view which previously experienced consistency issues has been deleted and will be rebuilt the next time it is needed.

Database: Mailbox Database

Folder: [MBX:John Smith][AllItems]

MsgHeader ID: 1110-1E6B08

Folder ID: 1110-3DA14B

View ID: 1110-3DA582

View Name: 1110-3DA14B +A-D-T301c

Function: EcAgeOutOneView

After seeing these events I came to the conclusion that there was logical corruption in this user’s Mailbox preventing ActiveSync from pulling the mail items down. So I immediately went to the handy replacement for ISINTEG, “New-MailboxRepairRequest”. (Reference1 Reference2)

So in this case I ran the following command:

New-MailboxRepairRequest -Mailbox John.Smith -CorruptionType FolderView,ProvisionedFolder,AggregateCounts,SearchFolder

The command lets you know the request was created but not much more than that. To view the logs on Mailbox Repair Requests you need to head back to the Application Log in Event Viewer (Reference )

We can see the below entries in the log:

10047

Mailbox level online integrity check for request ec853fb3-1999-4911-9782-5170a31a37cb started:

Database=Mailbox Database

Mailbox=4F1B824D-5C81-477E-B40B-418C888109F3

Flags=Detect, Fix

Tasks=SearchFolder, View, AggregateCount, ProvisionedFid

10062

Corruptions detected during online integrity check for request ec853fb3-1999-4911-9782-5170a31a37cb

Mailbox:4F1B824D-5C81-477E-B40B-418C888109F3 (John Smith)

Database:Mailbox Database

Corruption          Is Fixed FID         Property              Resolution

“Folder View”, Yes, “1110-1E6B0C (Inbox)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0C (Inbox)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0E (Sent Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0E (Sent Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0E (Sent Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0E (Sent Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0E (Sent Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0F (Deleted Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0F (Deleted Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B0F (Deleted Items)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B17 (Drafts)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6B1A (Tasks)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-1E6D67 (Junk E-Mail)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-3DA14B (AllItems)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-3DA14B (AllItems)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-3DA14B (AllItems)”, 0x00000001, “Delete the corrupted view”

“Folder View”, Yes, “1110-3DA14B (AllItems)”, 0x00000001, “Delete the corrupted view”

10048

Online integrity check for request ec853fb3-1999-4911-9782-5170a31a37cb completed successfully.

We definitely found corruption so I had the user try the sync again and it worked!…….partly…..   😦

We were able to download mail items but whenever we tried replying to a message we were met with an error message. At this point I was pretty lost in terms of the logging available to me so I used an old trick, manually deleting the Device from underneath the User object in AD using ADSIEDIT (obligatory warning about using ADSIEDIT with care)

I opened ADSIEDIT (Start>Run>adsiedit.msc) & navigated to the Domain Partition (Default Naming Context). I then drilled down to the user object in question. Underneath the object you’ll find a container called CN=ExchangeActiveSyncDevices & underneath that you’ll find the various devices associated with that user.

ADSIEDIT1

ADSIEDIT2

Even if you use TestExchangeConnectivity.com’s ActiveSync test you’ll see an entry for that “device” listed here.

In my case I deleted each of these devices & had the user delete their profiles & re-create them. Unfortunately, we were getting the same errors regarding the inability to reply to messages.

Final Solution

So at this point I had but one reasonable option left, and it had been staring me in the face since I first saw the 10030 Event ID.

“Attempts may be made to rebuild the view, but if this message continues to persist for this mailbox, moving the mailbox to a different database may resolve the issue.”

So I created a new mailbox database as well as a move request for this mailbox & set the Bad Item Limit to 50 (since I expected further corruption that I didn’t catch before with the Repair Request). I checked the status of the move with the below command:

Get-MoveRequest | Get-MoveRequestStatistics -IncludeReport | fl

Once the command completed I was able to see that the move skipped 14 items because they were corrupted. It’s my assumption that there were other issues that were resolved from the move because a move is essentially Exchange copying all the mailbox data into an entirely new mailbox.

After this, the user was finally able to get full functionality out of their ActiveSync devices. Issue resolved!

The Full Story

After resolving the issue I was contacted for I began to ask additional questions about the environment to try & get a better idea of what could cause this type of corruption (admittedly, something I should have done from the very beginning as you’ll soon find out). I found that the customer had gone through a series of hardware issues which resulted in them ultimately running an ESEUTIL /P on their Exchange database. Upon hearing this, not only did things make a bit more sense but I realized it was time for a bit of lecturing.

Now I could spend a whole article detailing the ins & outs of ESEUTIL as well as proper database recovery practices but to be frank; ESEUTIL /P SHOULD ALWAYS BE A LAST RESORT! It is a hard recovery which essentially whacks everything out of the JET database that it doesn’t understand as valid data, in an effort to get it to mount. Ideally, if a customer’s database were in a Dirty Shutdown state & a Soft Recovery (ESEUTIL /R) failed; then the next step would be to restore the .EDB database file from backup & replay existing Transaction Logs to get the database to a current state (many Exchange backup solutions do this).

I’ve only ever had to run a /P for customers who did not have a backup & who’s only other recovery option would be manually backing up Outlook Cached mode to .PST (the ugliest of all options). Environments like these are an excellent example of customers who would be great for Office 365 because they don’t have the IT Staff to maintain a proper backup practice. Unfortunately, many individuals find themselves with a database that won’t mount & ESEUTIL /P is the first thing that turns up in their search results so they run it haphazardly. 9 out of 10 times, the database will mount & you won’t really lose much data. However, I’ve also seen a 150GB .EDB database file go down to 60GB after running a /P because an entire table or similar got whacked out of the database because it was corrupted. Bottom line, /P IS A LAST RESORT!!!!!

Back to our ActiveSync Issue. There’s one other thing that should be noted after running a /P on a database. It leaves your database in an UNSUPPORTED configuration; at least for the time being. The official word from Microsoft Support is that as soon as you run an ESEUTIL /P you should immediately run an Offline Defrag on the database (ESEUTIL /D; essentially creating a new database) & then run a New-MailboxRepairRequest on all mailboxes in it. So this really shines some light on the customer situation above. They performed a /P but performed neither of the above procedures on the database. To be honest, few customers do because the Offline Defrag is so time consuming (5-10GB/hr depending on HW) & requires downtime. This customer suffered the consequences because while their database would mount after running the /P, they still had logical corruption in the database. It just chose to adversely affect ActiveSync in this case.

Now this is where my personal practices slightly differ from those of MS Support (use at your own risk). Starting in Exchange 2010, mailbox moves are Online. So what I do is immediately create move requests for all mailboxes on the database in question to another database. The mere process of moving a mailbox should remove corruption (as seen above) & it has the benefit of allowing your users to work while the move takes place. Once the mailboxes have been moved I then run New-MailboxRepairRequest against all the mailboxes.

This isn’t always the best method, it’s just the one I use when the customer is really concerned about getting back up as soon as possible (RTO vs RPO). I’ve also seen cases where one bad database causes Store.exe to crash & bring down every other database on that server; so in that case the Offline Defrag is required. Either way, the Microsoft Support method should be your 1st choice. The important thing is to take this as a lesson of what can happen if /P is run in ignorance. It’s not the only way logical corruption can occur but I’ve seen it as the culprit more than once.

 

Additional Reference:

http://www.paulhite.com/2013/05/repairing-mailbox-corruption-in.html

Once again, Unchecking IPv6 on a NIC Breaks Exchange 2013


Background:

It seems like this sentiment has been preached widely but yet I still see customers do this. In fact I’m writing this today because earlier this week I had a customer who’s Information Store Service, as well as the Exchange Transport Services, on Exchange 2013 would not start. Then earlier today a coworker actually did this in a lab which caused the same issue.

Summary:

Let’s start off with this, The Exchange Server Product Team performs Zero testing or validation on systems with IPv6 Disabled. So that right there should be a good indicator that you’re trailblazing on your own in the land of Exchange (bring a flashlight, it’s dark & scary).

So I’m going to cover two very different things here:

  • Unchecking IPv6 on the NIC adapter (BAD)
  • Properly Disabling IPv6 in the registry (Ok but not recommended by MS)

Unchecking Method (BAD):

Let’s first talk about un-checking IPv6 on your NIC adapters. The problem with this is while the OS still thinks it can & should be using IPv6, the NIC is unable to do so which leads to communications issues. An easy way to test that your OS is still trying to use IPv6 is to ping localhost after you have unchecked IPv6 on your NIC & rebooted. You’re see that you still get an IPv6 response. I actually did a write-up about this topic on the Sysadmin community on Reddit awhile back which you can find here. As a side note, check out the Exchange community a colleague & I moderate on reddit here.

While doing this has always caused sporadic issues with Exchange, Exchange 2013 seems to be even more sensitive in this regard. Since RTM, I’ve seen half a dozen Exchange 2013 issues that were resolved by re-checking IPv6 on the NIC adapter & rebooting. Here’s what I’ve seen so far:

  • Having Ipv6 unchecked when performing an Exchange 2013 install will result in a failed/incomplete installation which will result in having to perform a messy cleanup operation before you can continue.
  • Microsoft Exchange Active Directory Topology Service may not start if the Exchange 2013 server is also a Domain Controller and IPv6 has been unchecked. The solution is to re-check it & reboot the server.
  • Microsoft Exchange Transport Service as well as the Microsoft Exchange Frontend Transport, Microsoft Exchange Transport Submission, & Microsoft Exchange Transport Delivery services may not start if IPv6 has been unchecked on the NIC adapter of an Exchange 2013 Server.
  • Microsoft Exchange Information Store Service may not start if IPv6 has been unchecked on an Exchange 2013 Server.
  • NEW – See MVP Michael Van Horenbeeck’s post on how this can break the Hybrid Configuration Wizard

Disabling IPv6 in the Registry:

I started this post saying that MS does no testing or validation for systems with IPv6 disabled in ANY WAY. However, some customers may actually have reasons for disabling Ipv6. I’m actually interested in hearing them but I also know some customers are very adamant about it. There actually was an issue in the past where Outlook Anywhere wouldn’t work in certain scenarios with IPv6 enabled but this should not be a problem with a fully updated Exchange Server (reference).

I’ll also say that I personally have never had any issues with properly disabling IPv6 in the registry using this method. You basically add a DisabledComponents key to the registry with a value of 8 F’s (ffffffff) & then reboot the server. After this point IPv6 should be fully disabled. I’ve also spoken with a couple Microsoft Support Engineers who have also said that they have personally never seen any issues with disabling it this way; with Windows or Exchange. However, in my opinion you should have a good reason for doing so (and saying you don’t like IPv6 is NOT a good reason).

Lastly, I’d like to add that if you’re utilizing iSCSI on your Exchange server, there should be no issues with unchecking IPv6 on your iSCSI NICs if you choose to do so. The article was specifically in relation to NICs connected to your production/public/MAPI networks. As usual, follow your SAN vendor’s best practices when configuring iSCSI NICs.

Also, here’s a shameless plug for the ExchangeServer subreddit (http://www.reddit.com/r/exchangeserver) which I help moderate (username=ashdrewness). There’s always people such as myself answering questions on there.

Quick Exchange 2013 DAG Setup Guide


Background:

Had a co-worker ask for some basic DAG setup instructions in Exchange 2013 so I wrote a quick little guide. This covers the high points around creating the DAG as well as configuring the DAG member NICs & networks.

Step 1 – Pre-Stage DAG Computer Account
Reference. When deploying a DAG on Exchange Servers running Server 2012 you need to pre-stage the DAG computer account. The above link points to the official TechNet article for doing this but here are the basics of it:

  • Create a Computer Account in AD with the name of the DAG. For example, DAG-A.
  • Disable the Computer Account.
  • In Active Directory Users & Computers click View>Advanced Features. Go to the Computer Account & select Properties>Security tab.
  • From here you have two options; either Grant the Exchange Trusted Subsystem Full Control permissions to the DAG Computer Account or give the Computer Account of the first node you plan to join to the DAG Full Control permissions over the DAG Computer Account Object.
  • Reference2

Step 2 – Configure DAG NIC’s
Reference. Exchange 2013 performs automatic DAG network configuration depending on how the NIC’s are configured. This means if the NIC’s are configured correctly then you should not have to manually collapse the DAG Networks post DAG Setup. Upon adding the nodes to the DAG, it looks for the following properties on the NICs & makes a decision based on them:

  • NIC Binding Order
  • Default Gateway Present
  • Register DNS Checked

The DAG needs to separate MAPI/Public networks from Replication networks. This enables the DAG to properly utilize a network that the administrator has provisioned for Replication traffic & to only use the MAPI/Public networks for Replication if the Replication networks are down.

You want your MAPI/Public NICs to be top of the binding order in the OS & any Replication, Management, Backup, or iSCSI networks at the bottom of the binding order. This is a Core Windows Networking best practice as well as what the DAG looks for when trying to determine which NIC’s will be associated with the MAPI/Public DAG Networks.

The DAG also looks for the presence of a Default Gateway on the MAPI/DAG network NIC. Going along with another Windows Networking best practice, you should only have 1 Default Gateway configured in a Windows OS. If you have additional networks with different subnets on the DAG nodes then you would need to add static routes on each of the nodes using NETSH. More on this later.

Finally, NIC Properties>IPv4 Properties>Advanced>DNS>Register this connection’s addresses in DNS should be unchecked on all adapters except for the MAPI/Public NICs. This means all Replication, iSCSI, dedicated backup or management NICs should have this option unchecked. Again, this is a Windows Networking best practice but is vital for proper Automatic DAG Network Configuration in Exchange 2013.

Step 3 – Configure Routing if Needed (optional depending on DAG design)
If your DAG stretches subnets & you’re using dedicated Replication networks then they should be on their own subnet isolated from the MAPI/Public network. A common setup for a network such as this might be:

Site-Austin:
MAPI Network 192.168.1.0/24; Default Gateway 192.168.1.254
Replication Network 10.0.1.0/24; Default Gateway $Null

Site-Houston:
MAPI Network 192.168.2.0/24; Default Gateway 192.168.2.254
Replication Network 10.0.2.0/24; Default Gateway $Null

Now with the above configuration you would have some form of routing taking place between the two MAPI subnets. You would also have routing between the two Replication subnets. However, because you should only have 1 Default Gateway configured per server, DAG nodes in each site would be unable to communicate with each other over the Replication networks. This is where static routes come into play. You would run the following commands on the nodes to allow them to ping across to each other between the 10.0.1.x & 10.0.2.x networks (in the below example, REPL is the name of each node’s Replication NIC):

On Nodes in Site-Austin: “netsh interface ip add route 10.0.2.0/24 “REPL” 0.0.0.0”

On Nodes in Site-Houston: “netsh interface ip add route 10.0.1.0/24 “REPL” 0.0.0.0”

This is the preferred format for this command. There are some references to using the local interface IP instead of 0.0.0.0 but the format I use above is what is recommended by the Windows Networking Team. Reference.

According to our Networking Development Groups, the recommendation actually is that on-link routes should be added with a 0.0.0.0 entry for the next hop, not with the local address (particularly because the local address might be deleted) and with the interface specified.”

This all assumes there is physical routing in place between the two subnets, like a Router, layer 3 Switch, or a shared virtual network in Hyper-V/ESX.

Verify connectivity between nodes over these 10.0.x.x networks using Tracert or Pathping. Note that these steps are only required if your DAG spans subnets & has replication networks in different subnets. While it technically should work, it is not recommended to stretch subnets for DAG Networks across the WAN.

It should also be noted that there should be no routing between the MAPI Networks & the Replication Networks. They should be on isolated networks that have no contact with each other. Also, Microsoft wants no greater than 500ms round trip latency between DAG nodes when you have DAG members across latent network connections. It’s important for customers to realize that you should not set your expectations around this number alone. You could easily have a connection over 500ms & not experience copy queues if you have only 20 mailboxes with low usage profiles. Alternatively, you could have a connection with only 50ms of round-trip latency but see high copy queues if you have thousands of high-usage mailboxes & a small bandwidth pipe. Just know that this number is not an end all be all.

Step 4 – Create DAG & Add Nodes
This part is pretty straightforward & you can use the EAC to do it. Just remember to give the DAG an IP address in every MAPI subnet where you have DAG nodes. So in our scenario above you would give the DAG 2 IP addresses; one in the 192.168.1.0 subnet & another in the 192.168.2.0 subnet.

Step 5 – Manually configure DAG Networks if needed
Reference. If you have dedicated management networks, dedicated backup networks, or iSCSI NIC’s then you would actually have to perform some manual steps after your DAG is setup. These networks should be ignored by the DAG & for cluster use. In order to do this we must first enable Manual DAG Network Configuration, which is disabled by default. We would then need to configure the iSCSI or similar network to be ignored by the cluster. Perform the following steps:

  • Get-DatabaseAvailabilityGroup
  • Set-DatabaseAvailabilityGroup <DAGName> -ManualDagNetworkConfiguration:$True
  • Get-DatabaseAvailabilityGroupNetwork
  • Set-DatabaseAvailabilityGroupNetwork <iSCSI/Backup/Mgmt NetworkName> -IgnoreNetwork:$True

Finally, let’s validate everything. Run the below command:

Get-DatabaseAvailabilityGroupNetwork | Format-List Identity,ReplicationEnabled,IgnoreNetwork

Verify that the iSCSI/Backup/Mgmt networks have IgnoreNetwork set to True (the MAPI & Replication networks should have this set to False). Also verify that the Replication Networks have ReplicationEnabled set to True. Finally, verify that the MAPI network has ReplicationEnabled set to False. This prevents the MAPI network from being used for Replication by default. It can still be used for Replication if all other possible replication paths go down.

References:
http://technet.microsoft.com/en-us/library/ff367878.aspx

http://technet.microsoft.com/en-us/library/dd298065(v=exchg.150).aspx

http://blogs.technet.com/b/scottschnoll/archive/2012/10/01/storage-high-availability-and-site-resilience-in-exchange-server-2013-part-2.aspx

http://blogs.technet.com/b/askcore/archive/2009/05/26/active-route-gets-removed-on-windows-2008-failover-cluster-ip-address-offline.aspx

http://technet.microsoft.com/en-us/library/dd298008(v=exchg.141).aspx