Jetstress – Too Many IOPS?


Symptom:
Customer reported Jetstress failures with the message, “Fail – The test has 1.05381856535713 Average Database Page Fault Stalls/sec. This should be not higher than 1.” The customer had recently purchased multiple servers to be used in an Exchange DAG and these Jetstress failures were halting the project. What was unique about this deployment is the customer was using all local SSD storage for the solution.

Analysis:
I asked the customer to provide their Jetstress configuration XML file as well as their Jetstress Results HTML file. As soon as I saw the results file I knew what the issue was, but only because I’ve had discussions with fellow Exchange MCMs, MVPs, and Microsoft employees who had encountered this same odd behavior in the past. In short, Jetstress was generating TOO many IOPS, as seen in the below output:

To the untrained eye this may not be anything special, but I had to do a double take the first time I saw this. Jetstress was generating over 25,000 IOPS on this system. Also impressive was the performance of the hardware as there actually wasn’t any disk latency issues from an IO read/write perspective:

As you can see from the above screenshot, database and log read/write latency (msec) was still fairly low for the extremely high amount of IOPS being generated. Yet our issue was with Database Page Fault Stalls/Sec, which should always remain below 1 (shown below):

Background:
Let’s spend some time covering how Jetstress is meant to behave, as well as the proper strategy for effectively utilizing Jetstress. Your first step in working with Jetstress should be to download the Jetstress Field Guide where most of this information is held.

The primary purpose of Jetstress is to ensure a storage solution can adequately deliver the amount of IOPS needed for a particular Exchange design BEFORE Exchange is installed on the hardware. You should use the Exchange Sizing Calculator to properly size the environment; using inputs such as # of mailboxes, avg. messages sent/received per day, and avg. message size. Once completed, on the “Role Requirements” tab you will find a value called “Total Database Required IOPS” (per Server) which tells you the amount of IOPS each server must be able to deliver for your solution.  The value for “Total Log Required IOPS” can be ignored as Log IO represents sequential IO which is very easy on the disk subsystem.

With the per server Database Required IOPS value in hand, your job now is to get Jetstress to generate at least that amount of IOPS while delivering passing latency values. This is where some customers get confused due to improper expectations. They may think that Jetstress should always pass no matter what parameters they configure for it. I can tell you that I can make any storage solution fail Jetstress if I crank the thread count high enough, so it actually takes a bit of “under-the-hood” understanding to use Jetstress effectively.

Jetstress generates IO based on a global thread count, with each thread meant to generate 30-60 IOPS. Simply put, if I run a Jetstress test with 2 threads, I would expect it to generate ~120 IOPS on well performing hardware. Therefore, if my Exchange calculator stated I needed to achieve 1,200 IOPS on a server, I would begin by starting a quick 15min test with 20 threads (20 x 60=1,200). If that test generated at least 1,200 IOPS and passed, I would then run a 24hr test with 20 threads to ensure there’s no demons hiding in my hardware that only a long stress test can uncover. If that test passes then I’m technically in the clear and can proceed with the next phase of the Exchange project. Making sure to keep my calculator files, Jetstress configuration XML files, and Jetstress result HTML files in a safe location for potential future reference.

You could spend an hour reading the Jetstress Field Guide but what I’ve just covered is the short version. Get it to pass with the amount of IOPS you need and you’re in the clear. Of course it can often be much more complex than that. You may need to tweak the thread count or update hard drive firmware or correct a controller caching setting (see my post here for the correct settings) to achieve a pass. Auto-tuning actually makes this process a bit simpler, as it tries to determine the maximum amount of threads (which as stated is directly proportional to generated IOPS) the system can handle. However, it can lead to some confusion as people may focus too much on the maximum amount of IOPS a system can deliver instead of focusing on the amount of IOPS you actually need. While it’s certainly a valuable piece of information to know for future planning or even hardware repurposing, you shouldn’t stall your project trying to squeeze every last bit of IOPS out of the system if you’re already easily hitting your IOPS target and passing latency tests.

As someone who works for a hardware vendor, I’ll often get pulled into an escalation where a customer has opened Jetstress, cranked the thread count up to 50, it fails, and they’re pointing the finger at the hardware. However, based on what we already discussed, a thread count of 50 would be 3,000 IOPS. This is fine if the storage purchased with the system can support it. A single 7.2K NL SAS drive can achieve ~55 IOPS for Exchange database workloads, so if a customer has 20 single-disk RAID 0 drives in a system, the math tells us they can’t expect to achieve much more than 1,100 IOPS (20 x 55=1,100). It gets a bit more complicated when using RAID and having to consider which disks are actually in play (EX: In a 10-disk RAID 10, you only get to factor in 5 disks in terms of performance, due to the other 5 being used for mirroring) as well as write penalties when using RAID 5 or RAID 6. The bottom line is that you need a realistic understanding of the amount of IOPS the hardware you purchased can actually achieve, as we all must adhere to the performance laws of rotational media.

Resolution:
Coming back to the issue at hand, now having an understanding of how Jetstress is supposed to work, we see why the results were troubling. Jetstress actually should NOT be generating that many IOPS when using 35 threads. 35 threads should generate ~2,100 IOPS not 25,000 IOPS (35 threads x 60 expected IOPS per thread=2,100). More IOPS is not a good thing because if nothing else, Jetstress is supposed to be predictable in terms of the amount of IO it generates. So why did this system generate so many IOPS and why did it fail? The short answer is that Jetstress doesn’t play well with SSD drives. It will always try to generate more IOPS per thread than expected on SSD storage. As I am not a Jetstress developer I can’t explain why this occurs but after several colleagues have also seen this issue I can at least confirm it happens and provide a workaround. In our customer’s case, they manually specified 1 thread which generated ~2,200 IOPS and passed without any Page Fault Stall errors. This was still way more IOPS than 1 thread should be generating but it achieved their calculator IOPS requirements and allowed them to continue with their project. As for why the test was failing with Page Fault Stalls, even though actual disk latency was fine, I can only speculate. As a Page Fault Stall is an Exchange-related operation (related to querying disk for a database page) and not a pure disk latency operation, I wonder if Jetstress is designed to even generate that many IOPS. I’ve never personally seen it run with more than a few thousand IOPS, so it’s possible the Jetstress application itself couldn’t handle it.

I hope this cleared up some confusion around how to effectively utilize Jetstress. Also, if you happen to come across this specific issue I’d be interested in hearing about it in the comments.

Unable to Recreate Exchange Virtual Directory


Issue

A customer of mine recently had an issue where their Exchange 2013 OWA Virtual Directory was missing in IIS. When attempting to recreate the vDir we encountered the below error message:

“An error occurred while creating the IIS virtual directory `IIS://ServerName/W3SVC/1/ROOT/OWA’

1

To resolve this error I needed to resort to using a long lost tool from the days of old, the IIS 6 Resource Kit.

Note: This blog post could also be relevant if the OWA (or any other) vDir needed to be recreated and you encountered the same error upon recreation.

Resolution

Back in the days of Exchange 2003, the IIS Resource Kit, or more specifically the Metabase Explorer, could be used when recreating a Virtual Directory. Fortunately, the Metabase Explorer tool still works with IIS 8.

Download Link for the IIS 6 Resource Kit

The error encountered above was a result of the IIS Metabase still holding remnants of a past instance of the OWA Virtual Directory, which was preventing the New-OwaVirtualDirectory Cmdlet from successfully completing. It’s important to understand that an Exchange Virtual Directory is really located in two places; Active Directory and IIS. When running the Get-OwaVirtualDirectory Cmdlet (or similar commands for other Virtual Directories), you’re really querying Active Directory. For example, the OWA Virtual Directories for both the Default Web Site and Exchange Back End website in my lab are located in the following location in AD (via ADSIEDIT):

2

So if a vDir is missing in IIS but present in AD, you’ll likely need to first remove it using the Remove-*VirtualDirectory Cmdlet otherwise it will generate an error stating it already exists. In my customer’s scenario, I had to do this beforehand as the OWA vDir was present in AD but missing in IIS.

This brought us to the state we were in at the beginning of this post; receiving the above error message. The OWA vDir was no longer present in AD nor in the Default Web Site, but when trying to recreate it using New-OwaVirtualDirectory we received the above error message.

Tip: Use Get-*VirtualDirectory with the –ShowMailboxVirtualDirectories parameter to view the Virtual Directories on both web sites. For example:

3

The solution was to install the IIS 6 Resource Kit and use Metabase Explorer to delete the ghosted vDir. When installing the Resource Kit, select Custom Install and then uncheck all features except for Metabase Explorer 1.6 and proceed with the installation. Once it finishes, it may require you add the .NET Framework 3.5 Feature.

When you open the tool on the Exchange Server in question, navigate to the below tree structure and delete the old OWA Virtual Directory by right-clicking it and selecting Delete. When completed, the OWA vDir should no longer be present (as seen below).

4

You should now be able to successfully execute the New-OwaVirtualDirectory Cmdlet. It’s always a bit nostalgic seeing a tool of days gone by still able to save the day. I’d like to thank my co-worker John Dixon for help with this post. When I can’t figure something out in Exchange/IIS (or anything really) he’s who I lean on for help.

NIC DNS Registration and Exchange Servers


Symptom

I recently worked with a customer who had introduced an Exchange 2013 Server into an existing Exchange 2007 environment. The issue was the 2013 Server was unable to send email anywhere; neither externally or to other Exchange Servers. If you executed the below command to view the status of the transport queues you received the below output:

Get-Queue <Queue Identity> | FL

NIC

Specifically, the error message you would receive is “4.4.0 DNS query failed. The error was: DNS query failed with error ErrorRetry”

This is a fairly common error indicating there is an issue contacting the DNS Server or Servers that Exchange is configured to use. ReferenceA ReferenceB

Resolution

However, in this case the issue was not obvious, unless you had already seen this issue before or knew a little bit about the health checks Exchange uses to ensure it’s healthy.

I remembered seeing a similar issue on a Reddit thread awhile back, which caused me to search and find this Microsoft KB article titled “DNS query failed” error when an email message is stuck in the Draft folder in an Exchange Server 2013 environment”.

This was the resolution in my scenario as well. To resolve the issue, I simply had to re-check the “Register this connection’s addresses in DNS” option on the IPv4> Properties>Advanced>DNS tab on the primary NIC used for Active Directory communications. While you can uncheck this box on secondary NICs (such as for iSCSI, Replication, Backup, etc.), it should always remain checked on the MAPI/Primary NIC. I’ve also seen issues where having this unchecked on a 2013/2016 DAG node will result in Managed Availability-triggered database failovers.

Quick method to determine installed version of .NET Framework


Due to recent issues with unsupported versions of .NET being installed on Exchange servers, as well as the fact that Exchange Server requires specific versions of .NET to be installed (Exchange Server 2013 System Requirements & Exchange Server 2016 System Requirements), there is a need to quickly query the installed version of .NET on Exchange servers. I have also been involved in several Exchange support escalations where updating the Exchange servers from .NET 4.5.1 to 4.5.2 resolved CPU performance issues.

Fortunately, my coworker and fellow Exchange MCM Mark Henderson wrote this quick and easy way to query the currently installed version of .NET.

PowerShell Query Method

To query the local Registry using PowerShell, execute the below command in an elevated PowerShell session.

(Get-ItemProperty ‘HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full’  -Name Release).Release

You can then use the table below to reference the installed version of .NET. For instance, if the returned value is 379893, then .NET 4.5.2 is installed.

 

Version Value of the Release DWORD
.NET Framework 4.5 378389
.NET Framework 4.5.1 installed with Windows 8.1 378675
.NET Framework 4.5.1 installed on Windows 8, Windows 7 SP1, or Windows Vista SP2 378758
.NET Framework 4.5.2 379893
.NET Framework 4.6 installed with Windows 10 393295
.NET Framework 4.6 installed on all other Windows OS versions 393297
.NET Framework 4.6.1 installed on Windows 10 394254
.NET Framework 4.6.1 installed on all other Windows OS versions 394271
NET Framework 4.6.1 installed on all other Windows OS versions (With required Hotfix) 394294
.NET Framework 4.6.2 installed on Windows 10 Anniversary Update 394802
.NET Framework 4.6.2 installed on all other Windows OS versions 394806
.NET Framework 4.7.0 installed on Windows 10 Creators Update 460798
.NET Framework 4.7.0 installed on all other Windows OS versions 460805

Script method

Copy the below text into a text file and rename the extension to .ps1. You can then execute this script and have it automatically tell you the installed version of .NET.

# Determine the version of .net 4 framework by querying Registry HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full for Value of Release
#
# Based on https://msdn.microsoft.com/en-us/library/hh925568(v=vs.110).aspx
#
#
#

$Netver = (Get-ItemProperty ‘HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full’ -Name Release).Release

If ($Netver -lt 378389)
{
Write-Host “.NET Framework version OLDER than 4.5” -foregroundcolor yellow
}
ElseIf ($Netver -eq 378389)
{
Write-Host “.NET Framework 4.5” -foregroundcolor red
}
ElseIf ($Netver -le 378675)
{
Write-Host “.NET Framework 4.5.1 installed with Windows 8.1” -foregroundcolor red
}
ElseIf ($Netver -le 378758)
{
Write-Host “.NET Framework 4.5.1 installed on Windows 8, Windows 7 SP1, or Windows Vista SP2” -foregroundcolor red
}
ElseIf ($Netver -le 379893)
{
Write-Host “.NET Framework 4.5.2” -foregroundcolor red
}
ElseIf ($Netver -le 393295)
{
Write-Host “.NET Framework 4.6 installed with Windows 10” -foregroundcolor red
}
ElseIf ($Netver -le 393297)
{
Write-Host “.NET Framework 4.6 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 394254)
{
Write-Host “.NET Framework 4.6.1 installed on Windows 10” -foregroundcolor red
}
ElseIf ($Netver -le 394271)
{
Write-Host “.NET Framework 4.6.1 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 394294)
{
Write-Host “.NET Framework 4.6.1 installed on all other Windows OS versions (With required Hotfix)” -foregroundcolor red
}
ElseIf ($Netver -le 394802)
{
Write-Host “.NET Framework 4.6.2 installed on Windows 10 Anniversary Update” -foregroundcolor red
}
ElseIf ($Netver -le 394806)
{
Write-Host “.NET Framework 4.6.2 installed on all other Windows OS versions” -foregroundcolor red
}
ElseIf ($Netver -le 460798)
{
Write-Host “.NET Framework 4.6.2 installed on Windows 10 Creators Update” -foregroundcolor red
}
ElseIf ($Netver -le 460805)
{
Write-Host “.NET Framework 4.7.0 installed on all other Windows OS versions” -foregroundcolor red
}

 

References:

How to: Determine Which .NET Framework Versions Are Installed

 

Mailbox Anchoring affecting new deployments & upgrades


Update2 (March 1st 2016): Microsoft has released the following blog post which states this behavior will be reverted/absent in 2013 CU12 and RTM/CU1 versionf of Exchange 2016 Remote PowerShell Proxying Behavior in Exchange 2013 CU12 and Exchange 2016

Update: Microsoft has released the following KB article to address this issue: “Cannot process argument transformation” error for cmdlets in Exchange Server 2013 with CU11

Note: This article should also apply when Exchange 2016 CU1 releases and includes Mailbox Anchoring (unless Microsoft makes a change to behavior before it’s release). So the scenario of installing the first Exchange 2016 server using CU1 bits into an existing environment would also apply.

Summary

It was announced in Microsoft’s recent blog post about Exchange Management Shell and Mailbox Anchoring that the way Exchange is managed will change going forward. Starting with Exchange 2013 CU11 (released 12/10/2015) and Exchange 2016 CU1 (soon to be released), an Exchange Management Shell session will be directed to the Exchange Server where the user who is attempting the connection’s mailbox is located. If the connecting user does not have a mailbox, an arbitration mailbox (specifically SystemMailbox{bb558c35-97f1-4cb9-8ff7-d53741dc928c) will be used instead. In either case, if the mailbox is unavailable (because it’s on a database that’s dismounted or is on a legacy version of Exchange) then Exchange Management Shell will be inoperable.

Issue

While it has always been recommended to move system and Arbitration mailboxes to the newest version of Exchange as soon as possible, there is a scenario involving Exchange 2013 CU11 which have led to customer issues:

  • Existing Exchange 2010 Environment
  • The first version of Exchange 2013 installed into the environment is CU11
  • Upon installation, the Exchange Admin is unable to use Exchange Management Shell on Exchange 2013. Thus preventing the management of Exchange 2013 objects
  • The Exchange Admin may also be unable to access the Exchange Admin Center using traditional means

This is due to the new Mailbox Anchoring changes. If the Exchange Admin’s mailbox (or the Arbitration mailbox, if the Exchange Admin did not have a mailbox) was on Exchange 2013 then this issue would not exist. However, because this was the first Exchange 2013 server installed into the environment, and it was CU11, there was no way to prevent this behavior.

This issue was first reported by Exchange MVP Ed Crowley, and yesterday a customer of mine also encountered the issue. The symptoms were mostly the same but the ultimate resolution was fairly straightforward.

Possible Resolutions

Resolution#1:

Attempt to connect to Exchange Admin Center on 2013 using the “Ecp/?ExchClientVer=15” string at the end of the URL (Reference). For Example:

I’ve heard mixed results using this method. When Ed Crowley encountered this issue, this URL worked, yet when I worked with my customer I was still unable to access EAC by using this method. However, it is worth an attempt. Once you’re connected to EAC, you can use it to move your Exchange Admin mailbox to 2013. However, should you not have a mailbox for your Exchange Admin account, this method may fail because there’s currently no way to move Arbitration Mailboxes via the EAC. So it’s recommended to create a mailbox for your Exchange Admin account using the EAC and then you’ll be able to connect via EMS.

Resolution#2:

Note: Using this method has a low probability of success as Microsoft recommends using the newer version of Exchange to “pull” a mailbox from the older version. Based on feedback I’ve received from Microsoft Support, you may consider just skipping this step and going to Step 3.

Use Exchange 2010 to attempt to move the Exchange Admin mailbox to a database on Exchange 2013. Historically, it’s been recommended to always use the newest version of Exchange to perform a mailbox move. In my experience this is hit or miss depending on the version you’re moving from and the version you’re moving to. However, it’s worth attempting:

Issue the below command using Exchange 2010 Management Shell to move the Exchange Admin’s mailbox to the Exchange 2013 server:

New-MoveRequest <AdminMailbox> -TargetDatabase <2013Database>

If the Exchange Administrator does not have a mailbox, then move the Arbitration mailboxes to Exchange 2013:

Get-Mailbox –Arbitration | New-MoveRequest -TargetDatabase <2013Database>

Resolution#3:

Connect to Exchange 2013 CU11 using Local PowerShell and manually load the Exchange modules:

  • On the Exchange 2013 CU11 Server, open a Windows PowerShell window as Administrator
  • Run the following command:
    • Add-PSSnapin Microsoft.Exchange.Management.PowerShell.SnapIn

At this point the local PowerShell module can be used to move the Exchange Admin’s mailbox to the Exchange 2013 server:

New-MoveRequest <AdminMailbox> -TargetDatabase <2013Database>

If the Exchange Administrator does not have a mailbox, then move the Arbitration mailboxes to Exchange 2013:

Get-Mailbox –Arbitration | New-MoveRequest -TargetDatabase <2013Database>

In addition, there have been reported issues with 2013 EMS still having connectivity issues even after the relevant mailboxes have been moved. A different Windows user with appropriate Exchange permissions (using a different Windows profile) will work fine however. It seems there are PowerShell cookies for the initial profile used which could still be causing problems. In this scenario, you may have to remove all listed cookies in the following registry key (Warning, edit the registry at your own risk. A backup of the registry is recommended before making modifications):

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\WSMAN\Client\ConnectionCookies

Summary

It should be noted that while this scenario involved Exchange 2013 CU11 being installed into an existing Exchange 2010 environment, it can affect other scenarios as well:

  • Exchange 2013 CU11 or newer being installed into an existing Exchange 2010 environment
  • Exchange 2013 CU11 or newer being installed into an existing Exchange 2007 environment
  • Exchange 2016 CU1 (when released) or newer being installed into an existing Exchange 2010 environment

So unless Microsoft changes the behavior of Mailbox Anchoring, this is a precaution that should be taken when installing the first Exchange 2013 CU11/2016 CU1 (when released) server into an existing environment.

 

Edit: This forum post also describes the issue. In it, the user experiences odd behavior with the 2013 servers not being displayed if you run a Get-ExchangeServer & other odd behavior. This is similar to what I experienced in some lab testing. Ultimately, the same resolution applies.

https://social.technet.microsoft.com/Forums/en-US/05897b40-0717-437d-90ca-d550e3226c2a/exchange-2013-cu-11-breaks-some-admin-accounts-?forum=exchangesvrdeploy

 

Web Management Service will not start and causes Exchange update to fail


Today I had an Exchange update issue that I’d previously never encountered before. Exchange 2013 CU10 update failed saying the Web Management Service could not be started. Attempts to manually start the service failed. Application logs pointed to IIS-IISManager 1007 event saying the following:

“Unable to read the certificate with thumbprint ‘{thumbprint}’. Please make sure the SSL certificate exists and that is correctly configured in the Management Service page.”

The thumbprint it was listing was not found on the server, either using Get-ExchangeCertificate or the MMC certificate snap-in. A web search led me to the below article which resolved the issue. Normally, an Exchange server will have a certificate called “WMSvc-servername” (Friendly Name of WMSvc) and it will be bound in IIS to the Web Management Service, but in this case the certificate was missing. By binding another certificate to the service we were able to get the service to start and continue the Exchange Update. An alternative would be to request a new certificate for the purposes of this service.

https://technet.microsoft.com/en-us/library/cc735088(v=ws.10).aspx

Find the SSL certificate that the Web Management Service is using

To find the SSL certificate that the Web Management Service is using:

  1. Click Start, click Control Panel, and then click Administrative Tools.
  2. Right-click Internet Information Services (IIS) Manager and select Run as administrator.
  3. In the Connections pane, select the server that you want to manage.
  4. In Features View, double-click Management Service.
  5. Under SSL certificate ensure that a certificate is selected.
  6. Note the name of the certificate. By default, the name starts with “WMSvc”.

Additional Reference:
http://exctech2013.blogspot.com/2013/10/the-web-management-service-could-not-be.html