Jetstress – Too Many IOPS?


Symptom:
Customer reported Jetstress failures with the message, “Fail – The test has 1.05381856535713 Average Database Page Fault Stalls/sec. This should be not higher than 1.” The customer had recently purchased multiple servers to be used in an Exchange DAG and these Jetstress failures were halting the project. What was unique about this deployment is the customer was using all local SSD storage for the solution.

Analysis:
I asked the customer to provide their Jetstress configuration XML file as well as their Jetstress Results HTML file. As soon as I saw the results file I knew what the issue was, but only because I’ve had discussions with fellow Exchange MCMs, MVPs, and Microsoft employees who had encountered this same odd behavior in the past. In short, Jetstress was generating TOO many IOPS, as seen in the below output:

To the untrained eye this may not be anything special, but I had to do a double take the first time I saw this. Jetstress was generating over 25,000 IOPS on this system. Also impressive was the performance of the hardware as there actually wasn’t any disk latency issues from an IO read/write perspective:

As you can see from the above screenshot, database and log read/write latency (msec) was still fairly low for the extremely high amount of IOPS being generated. Yet our issue was with Database Page Fault Stalls/Sec, which should always remain below 1 (shown below):

Background:
Let’s spend some time covering how Jetstress is meant to behave, as well as the proper strategy for effectively utilizing Jetstress. Your first step in working with Jetstress should be to download the Jetstress Field Guide where most of this information is held.

The primary purpose of Jetstress is to ensure a storage solution can adequately deliver the amount of IOPS needed for a particular Exchange design BEFORE Exchange is installed on the hardware. You should use the Exchange Sizing Calculator to properly size the environment; using inputs such as # of mailboxes, avg. messages sent/received per day, and avg. message size. Once completed, on the “Role Requirements” tab you will find a value called “Total Database Required IOPS” (per Server) which tells you the amount of IOPS each server must be able to deliver for your solution.  The value for “Total Log Required IOPS” can be ignored as Log IO represents sequential IO which is very easy on the disk subsystem.

With the per server Database Required IOPS value in hand, your job now is to get Jetstress to generate at least that amount of IOPS while delivering passing latency values. This is where some customers get confused due to improper expectations. They may think that Jetstress should always pass no matter what parameters they configure for it. I can tell you that I can make any storage solution fail Jetstress if I crank the thread count high enough, so it actually takes a bit of “under-the-hood” understanding to use Jetstress effectively.

Jetstress generates IO based on a global thread count, with each thread meant to generate 30-60 IOPS. Simply put, if I run a Jetstress test with 2 threads, I would expect it to generate ~120 IOPS on well performing hardware. Therefore, if my Exchange calculator stated I needed to achieve 1,200 IOPS on a server, I would begin by starting a quick 15min test with 20 threads (20 x 60=1,200). If that test generated at least 1,200 IOPS and passed, I would then run a 24hr test with 20 threads to ensure there’s no demons hiding in my hardware that only a long stress test can uncover. If that test passes then I’m technically in the clear and can proceed with the next phase of the Exchange project. Making sure to keep my calculator files, Jetstress configuration XML files, and Jetstress result HTML files in a safe location for potential future reference.

You could spend an hour reading the Jetstress Field Guide but what I’ve just covered is the short version. Get it to pass with the amount of IOPS you need and you’re in the clear. Of course it can often be much more complex than that. You may need to tweak the thread count or update hard drive firmware or correct a controller caching setting (see my post here for the correct settings) to achieve a pass. Auto-tuning actually makes this process a bit simpler, as it tries to determine the maximum amount of threads (which as stated is directly proportional to generated IOPS) the system can handle. However, it can lead to some confusion as people may focus too much on the maximum amount of IOPS a system can deliver instead of focusing on the amount of IOPS you actually need. While it’s certainly a valuable piece of information to know for future planning or even hardware repurposing, you shouldn’t stall your project trying to squeeze every last bit of IOPS out of the system if you’re already easily hitting your IOPS target and passing latency tests.

As someone who works for a hardware vendor, I’ll often get pulled into an escalation where a customer has opened Jetstress, cranked the thread count up to 50, it fails, and they’re pointing the finger at the hardware. However, based on what we already discussed, a thread count of 50 would be 3,000 IOPS. This is fine if the storage purchased with the system can support it. A single 7.2K NL SAS drive can achieve ~55 IOPS for Exchange database workloads, so if a customer has 20 single-disk RAID 0 drives in a system, the math tells us they can’t expect to achieve much more than 1,100 IOPS (20 x 55=1,100). It gets a bit more complicated when using RAID and having to consider which disks are actually in play (EX: In a 10-disk RAID 10, you only get to factor in 5 disks in terms of performance, due to the other 5 being used for mirroring) as well as write penalties when using RAID 5 or RAID 6. The bottom line is that you need a realistic understanding of the amount of IOPS the hardware you purchased can actually achieve, as we all must adhere to the performance laws of rotational media.

Resolution:
Coming back to the issue at hand, now having an understanding of how Jetstress is supposed to work, we see why the results were troubling. Jetstress actually should NOT be generating that many IOPS when using 35 threads. 35 threads should generate ~2,100 IOPS not 25,000 IOPS (35 threads x 60 expected IOPS per thread=2,100). More IOPS is not a good thing because if nothing else, Jetstress is supposed to be predictable in terms of the amount of IO it generates. So why did this system generate so many IOPS and why did it fail? The short answer is that Jetstress doesn’t play well with SSD drives. It will always try to generate more IOPS per thread than expected on SSD storage. As I am not a Jetstress developer I can’t explain why this occurs but after several colleagues have also seen this issue I can at least confirm it happens and provide a workaround. In our customer’s case, they manually specified 1 thread which generated ~2,200 IOPS and passed without any Page Fault Stall errors. This was still way more IOPS than 1 thread should be generating but it achieved their calculator IOPS requirements and allowed them to continue with their project. As for why the test was failing with Page Fault Stalls, even though actual disk latency was fine, I can only speculate. As a Page Fault Stall is an Exchange-related operation (related to querying disk for a database page) and not a pure disk latency operation, I wonder if Jetstress is designed to even generate that many IOPS. I’ve never personally seen it run with more than a few thousand IOPS, so it’s possible the Jetstress application itself couldn’t handle it.

I hope this cleared up some confusion around how to effectively utilize Jetstress. Also, if you happen to come across this specific issue I’d be interested in hearing about it in the comments.

Misconfigured receive connector breaks voicemail delivery


Symptoms

In a Lync and Exchange UM environment (version doesn’t particularly matter in this case), voicemail messages were not being delivered. The voicemail folder on Exchange (C:\Program Files\Microsoft\Exchange Server\V15\UnifiedMessaging\voicemail) was filling up with hundreds of .txt (header files) and .wav (voicemail audio files).

Resolution

This issue is not necessarily new (Reference1 Reference2), but it didn’t immediately come up in search results. I also wanted to spend more time discussing why this issue happened and why it’s important to understand receive connector scoping.

This issue was caused by incorrectly modifying a receive connector on Exchange. Specifically, a custom connector used for application relay was modified so instead of only the individual IP addresses needed for relay (EX: Printers/Copiers/Scanners/3rd Party Applications requiring relay), the entire IP subnet was included in the Remote IP Ranges scoping. This ultimately meant that instead of Lync/ExchangeUM using the default receive connectors (which have the required “Exchange Server Authentication” enabled), they instead were using the custom application relay connector (which did not have Exchange Server Authentication enabled).

This resulted in the voicemail messages sitting in the voicemail folder and errors (Event ID 1423/1446/1335) being thrown in the Application log. The errors will state processing failed for the messages:

The Microsoft Exchange Unified Messaging service on the Mailbox server encountered an error while trying to process the message with header file “C:\Program Files\Microsoft\Exchange Server\V15\UnifiedMessaging\voicemail\<string>.txt”. Error details: “Microsoft.Exchange.UM.UMCore.SmtpSubmissionException: Submission to the Hub Transport server failed. The operation will be retried. —> Microsoft.Exchange.Net.ExSmtpClient.UnexpectedSmtpServerResponseException: Unexpected SMTP server response. Expected: 220, actual: 500, whole response: 500 5.3.3 Unrecognized command

It’s also possible that the voicemail messages will eventually be deleted due to having failed processing too many times (EventID 1335):

The Microsoft Exchange Unified Messaging service on the Mailbox server encountered an error while trying to process the message with header file “C:\Program Files\Microsoft\Exchange Server\V15\UnifiedMessaging\voicemail\<string>.txt”. The message will be deleted and the “MSExchangeUMAvailability: % of Messages Successfully Processed Over the Last Hour” performance counter will be decreased. Error details: “Microsoft.Exchange.UM.UMCore.ReachMaxProcessedTimesException: This message has reached the maximum processed count, “6”.

Unfortunately, once you see this message above (EventID 1335) the message cannot be recovered. When UM states the message will be deleted, it will in fact be deleted with no chance of recovery. If the issue had been going on for several days and this folder were part of your daily backup sets then you could technically restore the files and paste them into the current directory; where they would be processed. However, if you did not have a backup then these voicemails would be permanently lost.

Note: Certain failed voicemail messages can be found in the “C:\Program Files\Microsoft\Exchange Server\V15\UnifiedMessaging\badvoicemail” directory. However, as our failure was a permanent failure related to Transport, they did not get moved to the badvoicemail directory and instead were permanently deleted.

Background

I wanted to further explain how this issue happened, and hopefully clear up confusion around receive connector scoping. In our scenario, someone left a voicemail for an Exchange UM-enabled mailbox which was received and processed by Exchange. The header and audio files for this voicemail message were temporarily stored in the “C:\Program Files\Microsoft\Exchange Server\V15\UnifiedMessaging\voicemail” directory on the Exchange UM server. Our scenario involved Exchange 2013, but the same general logic would apply to Exchange 2007/2010/2016. UM would normally submit these voicemail messages to transport using one of the default Receive Connectors which would have “Exchange Server Authentication” enabled. These messages would then be delivered to the destination mailbox.

Our failure was a result of the UM services being directed to a Receive Connector which did not have the necessary authentication enabled on it (the custom relay connector which only had Anonymous authentication enabled). Under normal circumstances, this issue would probably be detected within a few hours (as users began complaining of not receiving voicemails) but in our case the change was made before the holidays and was not detected until this week (another reason to avoid IT changes before a long holiday). This resulted in the permanent Event 1335 failure noted above and the loss of the voicemail. Since this failure occurs before reaching transport, Safety Net will not be any help.

So let’s turn our focus to Receive Connector scoping, and specifically, defining the RemoteIPRange parameter. Remote IP Ranges define for which incoming IP address/addresses that connector is responsible for handling. Depending on the local listening port, local listening IP address, & RemoteIPRange configuration of each Receive Connector, the Microsoft Exchange Frontend Transport Service and Microsoft Exchange Transport Service will route incoming connections to the correct Receive Connector. The chosen connector then handles the connection accordingly, based on the connector’s configured authentication methods, permission groups, etc. A Receive Connector must have a unique combination of local listening port, local listening IP address, and Remote IP Address (RemoteIPRange) configuration. This means you can have multiple Receive Connectors with the same listening IP address and port (25 for instance) as long as each of their RemoteIPRange configurations are unique. You could also have the same RemoteIPRange configuration on multiple Receive Connectors if your port or listening IP are different; and so on.

The default Receive Connectors all have a default RemoteIPRange of 0.0.0.0-255.255.255.255 (all IPv4 addresses) and ::-ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff (all IPv6 addresses). The rule for processing RemoteIPRange configurations is that the most accurate configuration is used. Say I have two Receive Connectors in the below configuration:

Name: Default Receive Connector
Local Listening IP and Port (Bindings): 192.168.1.10:25
RemoteIPRange: 0.0.0.0-255.255.255.255

 

Name: ApplicationRelayConnector
Local Listening IP and Port (Bindings): 192.168.1.10:25
RemoteIPRange: 192.168.1.55

With this configuration, if an inbound connection on port 25 destined for 192.168.1.10 is created from 192.168.1.55, then ApplicationRelayConnector would be used and it’s settings would be applicable. If an inbound connection to 192.168.1.10:25 came from 192.168.1.200 then Default Receive Connector would instead be used.

The below image was taken from the “Troubleshooting Transport” chapter of the Exchange Server Troubleshooting Companion, an eBook co-authored by Paul Cunningham and myself. It’s a great visual aid for understanding which Receive Connector will accept which connection from a given remote IP address. The chapter also contains great tips for troubleshooting connectors, mail flow, and Exchange in general.

1-um

So in my customer’s specific scenario, instead of defining individual IP addresses on their custom application relay receive connector, they instead defined the entire internal IP subnet (192.168.1.0/24). This resulted in not only the internal devices needing to relay hitting the custom application relay connector, but also the Exchange Server itself and the Lync server also hitting the custom application relay connector; thus breaking Exchange Server Authentication. As a best practice, you should always use individual IP addresses when configuring custom application relay connectors, so that you do not inadvertently break other Exchange communications. If this customer had multiple Exchange Servers, this change would have also broken Exchange server-to-server port 25 communications.

Unable to Recreate Exchange Virtual Directory


Issue

A customer of mine recently had an issue where their Exchange 2013 OWA Virtual Directory was missing in IIS. When attempting to recreate the vDir we encountered the below error message:

“An error occurred while creating the IIS virtual directory `IIS://ServerName/W3SVC/1/ROOT/OWA’

1

To resolve this error I needed to resort to using a long lost tool from the days of old, the IIS 6 Resource Kit.

Note: This blog post could also be relevant if the OWA (or any other) vDir needed to be recreated and you encountered the same error upon recreation.

Resolution

Back in the days of Exchange 2003, the IIS Resource Kit, or more specifically the Metabase Explorer, could be used when recreating a Virtual Directory. Fortunately, the Metabase Explorer tool still works with IIS 8.

Download Link for the IIS 6 Resource Kit

The error encountered above was a result of the IIS Metabase still holding remnants of a past instance of the OWA Virtual Directory, which was preventing the New-OwaVirtualDirectory Cmdlet from successfully completing. It’s important to understand that an Exchange Virtual Directory is really located in two places; Active Directory and IIS. When running the Get-OwaVirtualDirectory Cmdlet (or similar commands for other Virtual Directories), you’re really querying Active Directory. For example, the OWA Virtual Directories for both the Default Web Site and Exchange Back End website in my lab are located in the following location in AD (via ADSIEDIT):

2

So if a vDir is missing in IIS but present in AD, you’ll likely need to first remove it using the Remove-*VirtualDirectory Cmdlet otherwise it will generate an error stating it already exists. In my customer’s scenario, I had to do this beforehand as the OWA vDir was present in AD but missing in IIS.

This brought us to the state we were in at the beginning of this post; receiving the above error message. The OWA vDir was no longer present in AD nor in the Default Web Site, but when trying to recreate it using New-OwaVirtualDirectory we received the above error message.

Tip: Use Get-*VirtualDirectory with the –ShowMailboxVirtualDirectories parameter to view the Virtual Directories on both web sites. For example:

3

The solution was to install the IIS 6 Resource Kit and use Metabase Explorer to delete the ghosted vDir. When installing the Resource Kit, select Custom Install and then uncheck all features except for Metabase Explorer 1.6 and proceed with the installation. Once it finishes, it may require you add the .NET Framework 3.5 Feature.

When you open the tool on the Exchange Server in question, navigate to the below tree structure and delete the old OWA Virtual Directory by right-clicking it and selecting Delete. When completed, the OWA vDir should no longer be present (as seen below).

4

You should now be able to successfully execute the New-OwaVirtualDirectory Cmdlet. It’s always a bit nostalgic seeing a tool of days gone by still able to save the day. I’d like to thank my co-worker John Dixon for help with this post. When I can’t figure something out in Exchange/IIS (or anything really) he’s who I lean on for help.

Quick method to determine installed version of .NET Framework


Edit: This excellent post by MVP Michel de Rooij details the proper steps for upgrading .NET version and Exchange Cumulative Updates in the proper order

Due to recent issues with unsupported versions of .NET being installed on Exchange servers, as well as the fact that Exchange Server requires specific versions of .NET to be installed (Exchange Server 2013 System Requirements & Exchange Server 2016 System Requirements), there is a need to quickly query the installed version of .NET on Exchange servers. I have also been involved in several Exchange support escalations where updating the Exchange servers from .NET 4.5.1 to 4.5.2 resolved CPU performance issues.

Fortunately, my coworker and fellow Exchange MCM Mark Henderson wrote this quick and easy way to query the currently installed version of .NET.

PowerShell Query Method

To query the local Registry using PowerShell, execute the below command in an elevated PowerShell session.

(Get-ItemProperty ‘HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full’  -Name Release).Release

You can then use the table below to reference the installed version of .NET. For instance, if the returned value is 379893, then .NET 4.5.2 is installed.

 

 

Version Value of the Release DWORD
.NET Framework 4.5

 

378389
.NET Framework 4.5.1 installed with Windows 8.1

 

378675
.NET Framework 4.5.1 installed on Windows 8, Windows 7 SP1, or Windows Vista SP2

 

378758
.NET Framework 4.5.2

 

379893
.NET Framework 4.6 installed with Windows 10

 

393295
.NET Framework 4.6 installed on all other Windows OS versions

 

393297
.NET Framework 4.6.1 installed on Windows 10

 

394254
.NET Framework 4.6.1 installed on all other Windows OS versions

 

394271
NET Framework 4.6.1 installed on all other Windows OS versions (With required Hotfix)

 

394294
.NET Framework 4.6.2 installed on Windows 10 Anniversary Update

 

394802
.NET Framework 4.6.2 installed on all other Windows OS versions

 

394806
.NET Framework 4.7.0 installed on Windows 10 Creators Update

 

460798
.NET Framework 4.7.0 installed on all other Windows OS versions

 

460805
.NET Framework 4.7.1 installed on Windows 10 Fall Creators Update

 

461308
.NET Framework 4.7.1 installed on all other Windows OS versions

 

461310

Copy the below text into a text file and rename the extension to .ps1. You can then execute this script and have it automatically tell you the installed version of .NET.Script method

# Determine the version of .net 4 framework by querying Registry HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full for Value of Release
#
# Based on https://msdn.microsoft.com/en-us/library/hh925568(v=vs.110).aspx
#
#
#

$Netver = (Get-ItemProperty ‘HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full’ -Name Release).Release

If ($Netver -lt 378389)
{
Write-Host “.NET Framework version OLDER than 4.5” -foregroundcolor yellow
}
ElseIf ($Netver -eq 378389)
{
Write-Host “.NET Framework 4.5” -foregroundcolor red
}
ElseIf ($Netver -le 378675)
{
Write-Host “.NET Framework 4.5.1 installed with Windows 8.1” -foregroundcolor red
}
ElseIf ($Netver -le 378758)
{
Write-Host “.NET Framework 4.5.1 installed on Windows 8, Windows 7 SP1, or Windows Vista SP2” -foregroundcolor red
}
ElseIf ($Netver -le 379893)
{
Write-Host “.NET Framework 4.5.2” -foregroundcolor red
}
ElseIf ($Netver -le 393295)
{
Write-Host “.NET Framework 4.6 installed with Windows 10” -foregroundcolor red
}
ElseIf ($Netver -le 393297)
{
Write-Host “.NET Framework 4.6 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 394254)
{
Write-Host “.NET Framework 4.6.1 installed on Windows 10” -foregroundcolor red
}
ElseIf ($Netver -le 394271)
{
Write-Host “.NET Framework 4.6.1 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 394294)
{
Write-Host “.NET Framework 4.6.1 installed on all other Windows OS versions (With required Hotfix)” -foregroundcolor red
}
ElseIf ($Netver -le 394802)
{
Write-Host “.NET Framework 4.6.2 installed on Windows 10 Anniversary Update” -foregroundcolor red
}
ElseIf ($Netver -le 394806)
{
Write-Host “.NET Framework 4.6.2 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 460798)
{
Write-Host “.NET Framework 4.7.0 installed on Windows 10 Creators Update” -foregroundcolor red
}
ElseIf ($Netver -le 460805)
{
Write-Host “.NET Framework 4.7.0 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 461308)
{
Write-Host “.NET Framework 4.7.1 installed on Windows 10 Fall Creators Update” -foregroundcolor red
}
ElseIf ($Netver -le 461310)
{
Write-Host “.NET Framework 4.7.1 installed on all other Windows OS versions” -foregroundcolor red
}

 

References:

How to: Determine Which .NET Framework Versions Are Installed

 

Emails from scanner to Exchange 2013 being sent as separate attachment


Scenario

After switching from hosted email to Exchange 2013 on-premises, a customer noticed that when using scan-to-email functionality the .PDF files it created were not showing up as expected. Specifically, instead of an email being received with the .PDF attachment of the scanned document, they were receiving the entire original message as an attachment (which then contained the .PDF).

When the scanner was configured to send to an external recipient (Gmail in this case), the issue did not occur & the message was formatted as expected. The message was still being relayed through Exchnage, it was just the recipient that made the difference. See the below screenshots for examples of each:

What the customer was seeing (incorrect format)

A

What the customer expected to see (correct format)

B

This may not seem like a big issue but it resulted in users on certain mobile devices not being able to view the attachments properly.

Troubleshooting Steps

There were a couple references on the MS forums to similar issues with older versions of 2013, but this server was updated. My next path was to see if there were any Transport Agents installed that could’ve been causing these messages to be modified. I used many of the steps in my previous blog post “Common Support Issues with Transport Agents” including disabling two 3rd party agents & restarting the Transport Service; the issue remained.

My next step was to disable both of the customer’s two Transport Rules (Get-TransportRule | Disable-TransportRule); one was related to managing attachment size while the other appended a disclaimer to all emails. This worked! By process of elimination I was able to determine it was the disclaimer rule causing the messages to be modified.

Resolution

Looking through the settings of the rule the first thing that caught my eye was the Fallback Option of “Wrap”. Per this article from fellow MVP Pat Richard, Wrap will cause Exchange to attach the original message & then generate a new message with our disclaimer in it (sounds like our issue).

C

However, making this change did not fix the issue, much to my bewilderment. There seemed to be something about the format of the email that Exchange did not like; probably caused by the formatting/encoding the scanner was using.

Ultimately, the customer was fine with simply adding an exception to the Transport Rule stating to not apply the rule to messages coming from the scanner sender email address.

D

 

Remember the basics when working with Dynamic Distribution Groups (I didn’t)


Overview:

I recently had a customer come to me with a simple issue of mail not being received in his Exchange 2013 environment when sending to a Dynamic Distribution Group he had just created. Well it certainly seemed like an easy issue to track down (which it technically was) but unfortunately I was a little too confident in my abilities & made the age-old mistake of overlooking the basics. Hopefully others can avoid that mistake after giving this a read.

Scenario:

Create a Dynamic Distribution Group named TestDL#1 whose membership is defined by a Universal Security Group named TestSecurityGroup using the following command in shell:

New-DynamicDistributionGroup -Name “TestDL#1” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”}

Note: This command places the Dynamic DL object into the default Users OU & also sets the msExchDynamicDLBaseDN to the Users OU’s Distibguished Name (CN=Users,DC=ASH,DC=NET). This will become important later.

I can verify the membership of this group by running:

$var = Get-DynamicDistributionGroup “TestDL#1”

Get-Recipient -RecipientPreviewFilter $var.RecipientFilter

In my case, the members show up correctly as John, Bob, Sam, & Dave. However, if I send emails to this group nobody gets them. When looking at messagetracking, the recipients show as {} (see below screenshot)

1

Now here’s the really interesting part. My security group, as well as my users are in the OU=End_Users,OU=Company_Users,DC=ASH,DC=NET Organizational Unit. However (as mentioned before in my Note), my Dynamic DL is in the CN=Users,DC=ASH,DC=NET Organizational Unit. Now if I move my users into the Users OU, then they receive the email & show up as valid recipients.

2

Now no matter which OU I move my Dynamic Distribution Group (TestDL#1) to, this behavior is the same.

For instance, if I had run the below command instead, I never would have noticed an issue because the Dynamic DL would’ve been created in the same OU as the users & the Security Group.

New-DynamicDistributionGroup -Name “TestDL#1” -OrganizationalUnit “ash.net/Company_Users/End_Users” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”}

The last head scratcher is if I move the actual AD Security Group (TestSecurityGroup) that I’m using to filter against to a different OU, I get the same behavior (no emails).

So it would seem that the solution is to ensure you always place the Dynamic Distribution Group into the same OU where ALL of your Security Group members are as well as the security group itself is.

This seemed crazy so I had to assume I wasn’t creating the filter correctly. It was at this point I pinged some colleagues of mine to see where I was going wrong.

Tip: Always get your buddies to peer review your work. A second set of eyes on an issue usually goes a long way to figuring things out.

Solution:

As it turned out, there were two things I failed to understand about this issue.

  1. When you create a Dynamic Distribution Group, by default, the RecipientContainer setting for that group is set to the OU where the DDG is placed. This means that because I initially did not specify the OU for the DDG to be placed in, it was placed in the Users OU (CN=Users,DC=ASH,DC=NET). So when Exchange was performing its query to determine membership, it could only see members that were in the Users OU. So the solution in my scenario would be to use the –RecipientContainer parameter when creating the OU & specify the entire domain.

EX: New-DynamicDistributionGroup -Name “TestDL#1” -RecipientFilter {MemberOfGroup -eq “CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET”} –RecipientContainer “ASH.NET”

This one was particularly embarrassing because the answer was clearly in the TechNet article for the New-DynamicDistributionGroup cmdlet.

  1. The other thing I didn’t realize was the reason my DDG broke when moving the Security Group I was filtering against. It was breaking because I specified the Security Group using its Distinguished Name, which included the OU it resided in (CN=TestSecurityGroup,OU=End_Users,OU=Company_Users,DC=ASH,DC=NET). So by moving the group I was making my query come up empty. Now the first thing I thought of was if I could specify the group using the common name or the GUID instead. Unfortunately, you cannot because of an AD limitation:

“MemberOfGroup filtering requires that you supply the full AD distinguished name of the group you’re trying to filter against. This is an AD limitation, and it happens because you’re really filtering this calculated back-link property from AD, not the simple concept of “memberOf” that we expose in Exchange.”

So the important thing to remember here is to either not move the Security Group you’re filtering against, or if you move it, to update your filter.

Thanks go to MVPs Tony Redmond & Tony Murray for pointing these two important facts out to me.

Conclusion:

As I found out, a strong foundational knowledge of Active Directory is key to being a strong Exchange Admin/Consultant/Support Engineer. But even when you feel confident in your abilities for a given topic, don’t be afraid to ask people you trust. You might find out you’re either a bit rusty or not as knowledgeable as you thought you were J

Mails Stuck In The Draft Folder


Today, I came cross another interesting mail flow issue, where all mails stuck in Draft folders for all users when they are using OWA. You can imagine that mail flow was broken, that non of users can send any mails internally or externally.

Customer has troubleshot it for over 12 hours, and has gone as far as re-install operating system and Exchange 2013 server with /RecoverServer switch, but issue remains.

When I started looking at the issue, I went through series of basic transport troubleshooting steps for Exchange 2013 multirole server, such as checking all transport related services, possible back pressure issue, and state of all server components. Of course, there is nothing wrong with them.

Running out of ideas, I checked settings of send connector, just to make sure there is nothing out of ordinary. I see this in Send Connector properties,

Image

 

There are not many reasons for any Exchange server to use External DNS server for lookups out there. For this environment, it certainly is not needed as well.

I unchecked the box, and restart transport service to speed up the process, but issue remans.

I then run get-TransportService | fl *dns*, to make sure that we don’t have any external DNS settings configured.

   Image

  Ah ha! External DNS server setting is set. I run few tests with nslookup, the DNS server did not respond to any queries. So that’s probably the reason why that mails are not flowing.

  To remove it, you have to run Set-TransportService -ExternalDNSAdapterEnabled $true -ExternalDNSServers $null.

  After restarting the transport service, all mails in the Draft folder are gone. Mail flow is restored!

Exchange 2010 SP3 installation fails on SBS 2011


I had an interesting issue with Exchange 2010 SP3 installation on a SBS 2011 server last night. Installation fails on the Hub Transport Server Role with following errors.

sbs 2011 upgrade sp3 error

 

This made me scratching my head. Why is it trying to remove existing certificate that is used by Exchange? It’s also the default SMTP certificate, that’s why setup was not able to remove it.

After investing further, I see this line in the PowerShell script,

Write-ExchangeSetupLog -Info “Removing default Exchange Certificate”;
Get-ExchangeCertificate | where {$_.FriendlyName.ToString() -eq “Microsoft Exchange”} | Remove-ExchangeCertificate

So it’s trying to remove default Exchange certificate that was created during the initial installation, that has friendly name “Microsoft Exchange”.

I’m thinking, there is no way the Godaddy certificate has Friendly Name “Microsoft Exchange”. After looking at the certificate properties, it is indeed the problem. The Friendly Name is showing “Microsoft Exchange”, instead of mail.domain.com.

In order for us to install SP3, we have to use SBS console to import a temporary certificate, so it updates “LeafCertThumbPrint” property in this registry key,

“HKEY_LOCAL_MACHINE\Software\Microsoft\SmallBusinessServer\Networking”

 Note: you can also update the registry manually with one of thumbprint from existing certificate that is already imported.

Exchange 2010 SP3 installs fine after the cert change.  Since we didn’t export the existing GoDaddy certificate before running SP3 setup, it was removed by the setup. In order for Exchange OA and Activesync clients  to continue function,  we have issue a new certificate request with proper Friendly Name, then import the new certificate. You can also reuse the existing certificate on GoDaddy’s website by using “Re-Key” option, but you might end up with a certificate without private key. To repair the missing private key, you can run following command
   certutil –repairstore my <serial number>

 

 

Bad NIC Settings Cause Internal Messages to Queue with 451 4.4.0 DNS query failed (nonexistent domain)


Overview:

I’ve come across this with customers a few times now & it can be a real head scratcher. However, the resolution is actually pretty simple.

 

Scenario:

Customer has multiple Exchange servers in the environment, or has just installed a 2nd Exchange server into the environment. Customer is able to send directly out & receive in from the internet just fine but is unable to send email to/through another internal Exchange server.

This issue may also manifest itself as intermittent delays in sending between internal Exchange servers.

In either scenario, messages will be seen queuing & if you run a “Get-Queue –Identity QueueID | Formal-List” you will see a “LastError” of “451 4.4.0 DNS query failed. The error was: SMTPSEND.DNS.NonExistentDomain; nonexistent domain”.

 

Resolution:

This issue can occur because the Properties of the Exchange Server’s NIC have an external DNS server listed in them. Removing the external DNS server/servers & leaving only internal (Microsoft DNS/Active Directory Domain Controllers in most customer environments) DNS Servers; followed by restarting the Microsoft Exchange Transport Service should resolve the issue.

 

Summary:

The Default Configuration of an Exchange Server is to use the local Network Adapter’s DNS settings for Transport Service lookups.

(FYI: You can alter this in Exchange 07/10 via EMS using the Set-TransportServer command or in EMC>Server Configuration>Hub Transport>Properties of Server. Or in Exchange 2013 via EMS using the Set-TransportService command or via EAC>Servers>Edit Server>DNS Lookups. Using any of these methods, you can have Exchange use a specific DNS Server.)

Because the default behavior is to use the local network adapter’s DNS settings, Exchange was finding itself using external DNS servers for name resolution. Now this seemed to work fine when it had to resolve external domains/recipients but a public DNS server would likely have no idea what your internal Exchange servers (i.e. Ex10.contoso.local) resolve to.The error we see is due to the DNS server responding, but it just not having the A record for the internal host that we require. If the DNS server you had configured didn’t exist or wasn’t reachable you would actually see slightly different behavior (like messages sitting in “Ready” status in their respective queues).

 

An Exchange server, or any Domain-joined server for that matter, should not have its NICs DNS settings set to an external/ISPs DNS server (even as secondary). Instead, they should be set to internal DNS servers which have all the necessary records to discover internal Exchange servers.

 

References

http://support.microsoft.com/kb/825036

http://technet.microsoft.com/en-us/library/bb124896(v=EXCHG.80).aspx

“The DNS server address that is configured on the IP properties should be the DNS server that is used to register Active Directory records.”

http://technet.microsoft.com/en-us/library/aa997166(v=exchg.80).aspx

http://exchangeserverpro.com/exchange-2013-manually-configure-dns-lookups/

http://thoughtsofanidlemind.com/2013/03/25/exchange-2013-dns-stuck-messages/

 

Unable to logon to O365 via ADFS – ADFSAppPool stops (aka. I had a bad day)


Environment:
Customer using Exchange Online/Office 365 with no Exchange servers on-prem. Two ADFS 2.0 servers running on Server 2008 R2, enabling them to logon to Exchange Online via SSO (Single Sign On).

Issue:
After rebooting the two ADFS servers post Windows Updates the customer could no longer login to OWA & would receive a “503 Service Unavailable” error message via IIS on the two ADFS servers.

Background:
I have to hang my head in shame with this one as I really should have figured this out sooner. Initial troubleshooting showed that the ADFSAppPool was stopped in IIS. It would start but as soon as you tried accessing it, it would stop again. Nothing at all in the Application or ADFS logs in Event Viewer (more on this poor bit of troubleshooting on my part later).The ADFS service account it was running under looked ok; the App Pool would start & so would the ADFS Service (both running under this account) so it seemed to not be a credential issue (at least I got that part right). I even went as far as to reinstall ADFS & IIS on the non-primary ADFS server in the event it was something in IIS. I was clearly out-classed on this seemingly simple issue.

Resolution:
Because the customer was down & I was scratching my head, I decided to escalate the issue to Microsoft; at which point they resolved the issue in about 5min.

Now before I say the fix I’d just like to say I consider myself a good troubleshooter. I’ve been troubleshooting all manner of Microsoft, Cisco, etc technologies for more than a decade & made a pretty successful career out of it. I even managed to pass both the MCM 2010 & MCSM 2013 lab exams on the 1st attempt; but today was not my day. I spent over 2 hrs on this & I broke the cardinal rule of troubleshooting; I overlooked the simple things. Like many of us do I started digging a hole of deep troubleshooting, expecting this to be an incredibly complex issue; I was looking at SPN’s, SQL Permissions, checking settings in Azure, etc. I should have just looked back up in the sky instead of trying to dig a hole a mile deep but only 3 ft wide, because for some idiotic reason I chose to overlook the System Event logs….

I suppose once I saw nothing in the Application or ADFS logs I just moved on quickly to the next possibility but in a few short minutes the Microsoft Engineer checked the System Logs & saw Event 5021 from IIS stating that the service account did not have Batch Logon Rights (more on the event here). This lead him to look at Group Policy settings & sure enough, there was a GPO allowing only the Domain Admins group to log on as a batch job. (Reference 1 & 2). It seems this setting took effect after the ADFS servers were rebooted post Windows Updates. Not sure how the GPO got there as this solution was working for 2 years beforehand but it certainly was ruining our day today. After the GPO was modified to allow the ADFS service account to log on as a batch job, the issue was resolved after some service restarts.

Moral of the story:
Never overlook the obvious!
 It’s the best advice I can give to anyone, anywhere, & who has to troubleshoot anything. I’d like to say this is the 1st time this has happened to me but it’s not. Overlooking typos, not checking to see if a network cable is plugged in, not checking to see if a service is started… It happens to the best of us. I suppose overlooking the simple solution is just part of the human condition…..or at least whatever condition I have….. 🙂