Windows Server Essentials O365 Integration Errors


Here’s a quick post to describe an issue I didn’t see referenced anywhere else except for within forum replies.

Issue
A customer had Windows Server 2012 R2 Essentials configured with Office 365 Integration but noticed they were unable to make any changes to the integration (such as changing the Admin account or adding new users) and the Exchange Online-related status indicators in the Essentials Dashboard were not being displayed properly. The customer stated this issue happened once before but apparently resolved itself. However, in this case, functionality had been broken for several weeks before they decided to reach out to me.

Specifically, when running the O365 Integration wizard you would receive an error stating, “Cannot connect to Microsoft Online services…. Make sure that the computer is connected to the Internet and then try again.”

Resolution
I first looked under the C:\ProgramData\Microsoft\Windows Server\Logs folder within the SharedServiceHost-EmailProviderServiceConfig.log file for any Integration Tool errors.

The log revealed the following error messages:

BecWebServiceAdapter: Connect to BECWS failed due to known exception : System.ServiceModel.EndpointNotFoundException: There was no endpoint listening at https://bws902-relay.microsoftonline.com/ProvisioningWebservice.svc that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. —> System.Net.WebException: Unable to connect to the remote server —> System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused

I was able to trace the error message to this Microsoft forum post where MVP Susan Bradley provided the resolution. In this case, the resolution was to navigate to the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Server\Productivity\O365Integration\Settings registry key and delete or rename the “BecEndPointAddress” registry entry. This entry had a value of the application server referenced in the error message above (bws902-relay.microsoftonline.com). After restarting all the Windows Server Essentials services, O365 Integration was fully restored.

My theory as to why this occurred is the application server the key referenced became inaccessible for some reason. I would think this value should be a load balanced name and not an individual server name.

Unable to Recreate Exchange Virtual Directory


Issue

A customer of mine recently had an issue where their Exchange 2013 OWA Virtual Directory was missing in IIS. When attempting to recreate the vDir we encountered the below error message:

“An error occurred while creating the IIS virtual directory `IIS://ServerName/W3SVC/1/ROOT/OWA’

1

To resolve this error I needed to resort to using a long lost tool from the days of old, the IIS 6 Resource Kit.

Note: This blog post could also be relevant if the OWA (or any other) vDir needed to be recreated and you encountered the same error upon recreation.

Resolution

Back in the days of Exchange 2003, the IIS Resource Kit, or more specifically the Metabase Explorer, could be used when recreating a Virtual Directory. Fortunately, the Metabase Explorer tool still works with IIS 8.

Download Link for the IIS 6 Resource Kit

The error encountered above was a result of the IIS Metabase still holding remnants of a past instance of the OWA Virtual Directory, which was preventing the New-OwaVirtualDirectory Cmdlet from successfully completing. It’s important to understand that an Exchange Virtual Directory is really located in two places; Active Directory and IIS. When running the Get-OwaVirtualDirectory Cmdlet (or similar commands for other Virtual Directories), you’re really querying Active Directory. For example, the OWA Virtual Directories for both the Default Web Site and Exchange Back End website in my lab are located in the following location in AD (via ADSIEDIT):

2

So if a vDir is missing in IIS but present in AD, you’ll likely need to first remove it using the Remove-*VirtualDirectory Cmdlet otherwise it will generate an error stating it already exists. In my customer’s scenario, I had to do this beforehand as the OWA vDir was present in AD but missing in IIS.

This brought us to the state we were in at the beginning of this post; receiving the above error message. The OWA vDir was no longer present in AD nor in the Default Web Site, but when trying to recreate it using New-OwaVirtualDirectory we received the above error message.

Tip: Use Get-*VirtualDirectory with the –ShowMailboxVirtualDirectories parameter to view the Virtual Directories on both web sites. For example:

3

The solution was to install the IIS 6 Resource Kit and use Metabase Explorer to delete the ghosted vDir. When installing the Resource Kit, select Custom Install and then uncheck all features except for Metabase Explorer 1.6 and proceed with the installation. Once it finishes, it may require you add the .NET Framework 3.5 Feature.

When you open the tool on the Exchange Server in question, navigate to the below tree structure and delete the old OWA Virtual Directory by right-clicking it and selecting Delete. When completed, the OWA vDir should no longer be present (as seen below).

4

You should now be able to successfully execute the New-OwaVirtualDirectory Cmdlet. It’s always a bit nostalgic seeing a tool of days gone by still able to save the day. I’d like to thank my co-worker John Dixon for help with this post. When I can’t figure something out in Exchange/IIS (or anything really) he’s who I lean on for help.

Web Management Service will not start and causes Exchange update to fail


Today I had an Exchange update issue that I’d previously never encountered before. Exchange 2013 CU10 update failed saying the Web Management Service could not be started. Attempts to manually start the service failed. Application logs pointed to IIS-IISManager 1007 event saying the following:

“Unable to read the certificate with thumbprint ‘{thumbprint}’. Please make sure the SSL certificate exists and that is correctly configured in the Management Service page.”

The thumbprint it was listing was not found on the server, either using Get-ExchangeCertificate or the MMC certificate snap-in. A web search led me to the below article which resolved the issue. Normally, an Exchange server will have a certificate called “WMSvc-servername” (Friendly Name of WMSvc) and it will be bound in IIS to the Web Management Service, but in this case the certificate was missing. By binding another certificate to the service we were able to get the service to start and continue the Exchange Update. An alternative would be to request a new certificate for the purposes of this service.

https://technet.microsoft.com/en-us/library/cc735088(v=ws.10).aspx

Find the SSL certificate that the Web Management Service is using

To find the SSL certificate that the Web Management Service is using:

  1. Click Start, click Control Panel, and then click Administrative Tools.
  2. Right-click Internet Information Services (IIS) Manager and select Run as administrator.
  3. In the Connections pane, select the server that you want to manage.
  4. In Features View, double-click Management Service.
  5. Under SSL certificate ensure that a certificate is selected.
  6. Note the name of the certificate. By default, the name starts with “WMSvc”.

Additional Reference:
http://exctech2013.blogspot.com/2013/10/the-web-management-service-could-not-be.html

Emails from scanner to Exchange 2013 being sent as separate attachment


Scenario

After switching from hosted email to Exchange 2013 on-premises, a customer noticed that when using scan-to-email functionality the .PDF files it created were not showing up as expected. Specifically, instead of an email being received with the .PDF attachment of the scanned document, they were receiving the entire original message as an attachment (which then contained the .PDF).

When the scanner was configured to send to an external recipient (Gmail in this case), the issue did not occur & the message was formatted as expected. The message was still being relayed through Exchnage, it was just the recipient that made the difference. See the below screenshots for examples of each:

What the customer was seeing (incorrect format)

A

What the customer expected to see (correct format)

B

This may not seem like a big issue but it resulted in users on certain mobile devices not being able to view the attachments properly.

Troubleshooting Steps

There were a couple references on the MS forums to similar issues with older versions of 2013, but this server was updated. My next path was to see if there were any Transport Agents installed that could’ve been causing these messages to be modified. I used many of the steps in my previous blog post “Common Support Issues with Transport Agents” including disabling two 3rd party agents & restarting the Transport Service; the issue remained.

My next step was to disable both of the customer’s two Transport Rules (Get-TransportRule | Disable-TransportRule); one was related to managing attachment size while the other appended a disclaimer to all emails. This worked! By process of elimination I was able to determine it was the disclaimer rule causing the messages to be modified.

Resolution

Looking through the settings of the rule the first thing that caught my eye was the Fallback Option of “Wrap”. Per this article from fellow MVP Pat Richard, Wrap will cause Exchange to attach the original message & then generate a new message with our disclaimer in it (sounds like our issue).

C

However, making this change did not fix the issue, much to my bewilderment. There seemed to be something about the format of the email that Exchange did not like; probably caused by the formatting/encoding the scanner was using.

Ultimately, the customer was fine with simply adding an exception to the Transport Rule stating to not apply the rule to messages coming from the scanner sender email address.

D

 

The Importance of Updated Domain Controllers When Deploying Exchange


Overview
Much is made about a healthy Active Directory environment being a prerequisite for a healthy Exchange deployment. This can be especially challenging when there are separate teams managing AD & Exchange; meaning sometimes things can slip through the cracks.

Issue
A colleague of mine recently ran into an issue when preparing to deploy Exchange 2013 into an existing Exchange Organization. While running Setup /PrepareAD, the process would fail at about 14%, stating the domain controller is not available. It was determined that the DC holding all of the FSMO roles was in the process of a reboot. At first the assumption was that this was coincidental; possibly the work of the AD team. After the server came back up, /PrepareAD was run again & had the exact same result! So it appeared something that the /PrepareAd process was doing was the culprit. The event logs on the DC gave the below output:

EVENTID: 1000
Faulting application name: lsass.exe, version: 6.3.9600.16384, time stamp: 0x5215e25f

Faulting module name: ntdsai.dll, version: 6.3.9600.16421, time stamp: 0x524fcaed

Exception code: 0xc0000005

Fault offset: 0x000000000019e45d

Faulting process id: 0x1ec

Faulting application start time: 0x01d0553575d64eb5

Faulting application path: C:\Windows\system32\lsass.exe

Faulting module path: C:\Windows\system32\ntdsai.dll

Report Id: 53c0474e-c12d-11e4-9406-005056890b81

Faulting package full name:

Faulting package-relative application ID:

EVENTID: 1015
A critical system process, C:\Windows\system32\lsass.exe, failed with status code c0000005.  The machine must now be restarted.

The logs were saying that the Lsass.exe process was crashing, leading to the Domain Controller restarting (see image below).

1

The easiest path of troubleshooting lead towards moving the FSMO roles to another server & seeing if the issue followed it. Setup /PrepareAD was run again & the issue did in fact follow the FSMO roles.

Resolution
It was at this point that I was engaged & I had a feeling this was either a performance issue on the domain controllers or something buggy at play. Before too long I was able to find the below MS KB for an issue that seemed to match our symptoms:

“Lsass.exe process and Windows Server 2012 R2-based domain controller crashes when the server runs under low memory”
http://support.microsoft.com/en-us/kb/3025087

The customer was more than willing to install the hotfix, but we soon realized that we also had to install the prerequisite update package below (which was sizeable):

Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 update: April 2014
http://support.microsoft.com/en-us/kb/2919355

During this time, the domain controller was also updated to .NET 4.5.2. After all of this was done, Setup /PrepareAD completed successfully. My colleague was 90% certain the hotfix was the fix, but also noted that before the patch the DC’s CPU utilization was consistently running at 60%. After the updates, it now sits in the 20-30% range. So regardless, we saw much better performance & stability after updating the Domain Controllers.

Conclusions
While I understand we can’t all be up to date on our patching 100% of the time, there is some health checking we can do to the environments we manage.

For all Windows servers, I strongly recommend getting a performance baseline of the big 3: Disk, Memory, & CPU. I like to say that you can’t truly say what bad performance is defined by if you don’t have a definition of good performance in the first place. Staying up to date with Windows Updates can greatly help with this. Even though a system may have performed to a certain level at one point in time, doesn’t mean any number of other variable couldn’t have changed since then to result in poor performance today; often times, vendor updates can remedy this.

As for Domain Controllers, they’re one of the easiest workloads to test with, since a new DC can be created with relative ease. You can use a test environment (recommended) or simply deploy Windows updates to a select number of domain controllers & then compare the current behavior with your baseline.

In this customer’s case, these performance/stability issues could have resulted in any number of applications to fail that relied on AD. Some failures may have been silent, while others could’ve been show stoppers like this one.

 

Exchange 2010 Outlook Anywhere users receiving prompts when proxied through Exchange 2013


Background

I was working with a customer who had Exchange 2010 & were in the process of migrating to Exchange 2013. As part of their migration process they pointed their Exchange 2010 Outlook Anywhere namespace (let’s call it mail.contoso.com) to Exchange 2013 in DNS. At this point all of their Outlook Anywhere clients should have been connecting to Exchange 2013 & then been proxied to Exchange 2010. While this was somewhat working, they also immediately noticed users were randomly being prompted for credentials, resulting in a negative user experience.

Sometimes the prompts would be when connecting to Public Folders while other times mail or directory connections from Outlook to Exchange.

Resolution

When I was approached with this issue/symptom it sounded familiar. After a search through my OneNote I realized I previously had a discussion with some people I know from Microsoft Support regarding this issue. Turns out this issue was recently addressed via http://support2.microsoft.com/kb/2990117 “Outlook Anywhere users prompted for credentials when they try to connect to Exchange Server 2013”.

This is actually an IIS issue with Server 2008 R2 (the operating system Exchange 2010 was installed on) that’s resolved by a hotfix. After installing the hotfix & rebooting the issue was resolved & their users no longer received the prompts.

 

 

Exchange 2010 SP3 installation fails on SBS 2011


I had an interesting issue with Exchange 2010 SP3 installation on a SBS 2011 server last night. Installation fails on the Hub Transport Server Role with following errors.

sbs 2011 upgrade sp3 error

 

This made me scratching my head. Why is it trying to remove existing certificate that is used by Exchange? It’s also the default SMTP certificate, that’s why setup was not able to remove it.

After investing further, I see this line in the PowerShell script,

Write-ExchangeSetupLog -Info “Removing default Exchange Certificate”;
Get-ExchangeCertificate | where {$_.FriendlyName.ToString() -eq “Microsoft Exchange”} | Remove-ExchangeCertificate

So it’s trying to remove default Exchange certificate that was created during the initial installation, that has friendly name “Microsoft Exchange”.

I’m thinking, there is no way the Godaddy certificate has Friendly Name “Microsoft Exchange”. After looking at the certificate properties, it is indeed the problem. The Friendly Name is showing “Microsoft Exchange”, instead of mail.domain.com.

In order for us to install SP3, we have to use SBS console to import a temporary certificate, so it updates “LeafCertThumbPrint” property in this registry key,

“HKEY_LOCAL_MACHINE\Software\Microsoft\SmallBusinessServer\Networking”

 Note: you can also update the registry manually with one of thumbprint from existing certificate that is already imported.

Exchange 2010 SP3 installs fine after the cert change.  Since we didn’t export the existing GoDaddy certificate before running SP3 setup, it was removed by the setup. In order for Exchange OA and Activesync clients  to continue function,  we have issue a new certificate request with proper Friendly Name, then import the new certificate. You can also reuse the existing certificate on GoDaddy’s website by using “Re-Key” option, but you might end up with a certificate without private key. To repair the missing private key, you can run following command
   certutil –repairstore my <serial number>

 

 

AD Certificate Services not starting due to database in Dirty Shutdown


Background

I had a customer running SBS (Small Business Server) 2011, which runs Exchange 2010, who needed to renew their SSL Certificate as it had recently expired. I have quite a bit of experience with SBS since we have a large Support customer base running it & while it can be a pain to troubleshoot because of so many moving pieces (AD/Exchange/SharePoint/WSUS/SQL/RD Gateway all on one box) there are a few cool features. One of these features is the “Setup your Internet Address” wizard. Because SBS is also its own Certificate Authority the wizard will generate a certificate for you & assign it to Exchange/IIS/RD Gateway. It will also configure all the Exchange virtual directories for you as well as create a certificate install package for you to deploy to non-domain joined systems so your Outlook Anywhere clients will trust your CA.

 

Issue

However, when going to re-run the wizard to renew the certificate I received an error regarding the Active Directory Certificate Service not running. The System event logs had a 7024 event from “Service Control Manager” stating “The Active Directory Certificate Services service terminated with the service-specific error %%939523546”.

So we were unable to request a new certificate & the customer was hoping to avoid purchasing a third-party certificate since they had been working fine (for an extremely small shop) like this for several years.

 

Solution

After researching the error, I found that the error code given pointed to the Certificate Authority database being corrupted. So I navigated to C:\Windows\System32\Certlog & I found an old friend; an ESE (Extensible Storage Engine) database file.  If you didn’t know already, ESE isn’t just used for Exchange.

AD Certificate Services, DHCP (C:\Windows\System32\dhcp\dhcp.mdb), & Active Directory itself (C:\Windows\NTDS\ntds.dit) all use ESE databases.

The caveat however is that instead of ESEUTIL, you should use ESENTUTL to work with them.

(Additional references 1 & 2)

So I ran esentutl /mh <CA Name>.edb to view the header of the database file & found that it was in a Dirty Shutdown. I then tried to run a Recovery against the database by running Esentutl /r edb but this failed.

If this were an Exchange database then this would be where I would try to restore from a backup. Unfortunately this customer did not have a backup of their CA database file (I think a lot of customers would fall into this category) so I had to move onto running a Repair which is the dreaded “/P”.

Microsoft Support offers strict guidance around running a “/p” on Exchange (like performing a Defrag or a Mailbox Move followed by an Integrity Check/Mailbox Repair immediately after having to run a /p; Also, it should be considered a LAST resort) but no such guidance exists for Certificate Services since it is a much MUCH simpler database structure. But a ‘/P” is almost always a destructive action, with associated data loss, so if you have a backup you should always pursue that option first

I ran esentutl /p <CA Name>.edb & after it completed I was then able to start the Active Directory Certificate Services Service. All the proper data (including Issued Certificates & Templates) were still there & after re-running the SBS “Setup your Internet Address” wizard the customer now had a renewed certificate.

 

 

 

Unable to logon to O365 via ADFS – ADFSAppPool stops (aka. I had a bad day)


Environment:
Customer using Exchange Online/Office 365 with no Exchange servers on-prem. Two ADFS 2.0 servers running on Server 2008 R2, enabling them to logon to Exchange Online via SSO (Single Sign On).

Issue:
After rebooting the two ADFS servers post Windows Updates the customer could no longer login to OWA & would receive a “503 Service Unavailable” error message via IIS on the two ADFS servers.

Background:
I have to hang my head in shame with this one as I really should have figured this out sooner. Initial troubleshooting showed that the ADFSAppPool was stopped in IIS. It would start but as soon as you tried accessing it, it would stop again. Nothing at all in the Application or ADFS logs in Event Viewer (more on this poor bit of troubleshooting on my part later).The ADFS service account it was running under looked ok; the App Pool would start & so would the ADFS Service (both running under this account) so it seemed to not be a credential issue (at least I got that part right). I even went as far as to reinstall ADFS & IIS on the non-primary ADFS server in the event it was something in IIS. I was clearly out-classed on this seemingly simple issue.

Resolution:
Because the customer was down & I was scratching my head, I decided to escalate the issue to Microsoft; at which point they resolved the issue in about 5min.

Now before I say the fix I’d just like to say I consider myself a good troubleshooter. I’ve been troubleshooting all manner of Microsoft, Cisco, etc technologies for more than a decade & made a pretty successful career out of it. I even managed to pass both the MCM 2010 & MCSM 2013 lab exams on the 1st attempt; but today was not my day. I spent over 2 hrs on this & I broke the cardinal rule of troubleshooting; I overlooked the simple things. Like many of us do I started digging a hole of deep troubleshooting, expecting this to be an incredibly complex issue; I was looking at SPN’s, SQL Permissions, checking settings in Azure, etc. I should have just looked back up in the sky instead of trying to dig a hole a mile deep but only 3 ft wide, because for some idiotic reason I chose to overlook the System Event logs….

I suppose once I saw nothing in the Application or ADFS logs I just moved on quickly to the next possibility but in a few short minutes the Microsoft Engineer checked the System Logs & saw Event 5021 from IIS stating that the service account did not have Batch Logon Rights (more on the event here). This lead him to look at Group Policy settings & sure enough, there was a GPO allowing only the Domain Admins group to log on as a batch job. (Reference 1 & 2). It seems this setting took effect after the ADFS servers were rebooted post Windows Updates. Not sure how the GPO got there as this solution was working for 2 years beforehand but it certainly was ruining our day today. After the GPO was modified to allow the ADFS service account to log on as a batch job, the issue was resolved after some service restarts.

Moral of the story:
Never overlook the obvious!
 It’s the best advice I can give to anyone, anywhere, & who has to troubleshoot anything. I’d like to say this is the 1st time this has happened to me but it’s not. Overlooking typos, not checking to see if a network cable is plugged in, not checking to see if a service is started… It happens to the best of us. I suppose overlooking the simple solution is just part of the human condition…..or at least whatever condition I have….. 🙂

Common Support Issues with Transport Agents


This is a fairly basic post but it happens enough that I’d like to call out the basics of troubleshooting it. I’ve seen many cases over time where mail flow is either being halted or become sluggish due to a third-party transport agent (I actually saw 3 instances of this happening this past month which prompted this post).

Examples of Transport Agents could be Anti-Virus software, Anti-Spam software, DLP software, agents which add disclaimers to email messages, or email archiving solutions. I won’t call out specific vendors as I don’t think there’s necessarily anything wrong with any particular one. Sometimes an install of a piece of software just becomes corrupted or there’s some unforeseen incompatibility between the third-party software & Exchange; or some other software in the environment. However, sometimes the Agent can indeed have a bug which needs to be addressed with the vendor.

Anyways, here’s the ways in which I’ve seen these issues manifest themselves:

  • Messages Stuck in the Submission queue
  • A delay in SMTP response (when you telnet to the Exchange Server over 25, it takes longer than expected for the server’s SMTP banner to be displayed)
  • Messages are slow to flow through the transport pipeline (general slow delivery)
  • Microsoft Exchange Transport Service will not start or repeatedly crashes

To highlight more recent examples, last week I had a colleague come to me saying he had two Exchange 2010 Hub/CAS boxes, with the same config, yet one of them would have a slower connection when he would telnet to it; the banner would take at least 20 seconds to be displayed. This also resulted in the health checks for the hardware load balancer in place to mark the server as down. Each server had the same Anti-V/Anti-SPAM software installed, yet only one was showing the symptoms. For testing purposes he “disabled” the third-party software using its management interface but the issue persisted.

However, after running a “Get-TransportAgent” on the server, the Transport Agent still showed as being “Enabled”. This demonstrates a point I frequently make with customers, that disabling Anti-Virus software rarely serves as a useful troubleshooting step (even file-based Anti-V). This is because the TransportAgent is typically still enabled. For file-based Anti-Virus, even with the Services disabled there is usually still a network filter driver that is sitting on the TCP/IP stack which could be causing issues (only an uninstall of the 3rd-party product removes it).

Bottom-line, an uninstall is still the best method to remove potentially problematic Anti-V/Anti-SPAM/Anti-Malware software. So in this case the issue was a bad/corrupted install of the product on that server.

Another scenario (also Exchange 2010) was where messages were stuck in the Submission Queue for extended periods of time. The Application Logs were filled with Event 1050 MSExchange Extensibility events which were stating the installed agent was taking an unusual amount of time to process an event; thus causing the delay in transport (Reference 1 2 3).

After running Get-TransportAgent I was actually greeted by an error message saying it was unable to access a file located in the “C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\agents” directory. This is where the files associated with your Transport Agents are stored. So again, the issue was a corrupted install of the product. Reinstalling the software resolved the issue.

So nothing fancy about this one. Just check Event Viewer for Transport events or use process of elimination if you’re experiencing any of the symptoms above. Having worked with Microsoft Support many times in the past, they will almost always ask you to remove third-party components such as Anti-V if they are unable to pinpoint the issue to its source; so save yourself some time & rule it out first.

I know some people work for companies where this is like pulling teeth but it’s always going to be a battle between usability & security. If your management requires you spend 40 hours on the phone working with a vendor or Microsoft before finally being told you’re going no further until removing the third-party component then I give you my best & suggest you get the coffee started. We all know the most important acronym in IT is CYA after all 😉

For great reading on Exchange Transport Agents see MCM/MCSM/MVP Brian Reid’s two posts on the topic

Creating a Simple Exchange Server Transport Agent

Exchange 2013 Transport Agents