Unable to logon to O365 via ADFS – ADFSAppPool stops (aka. I had a bad day)


Environment:
Customer using Exchange Online/Office 365 with no Exchange servers on-prem. Two ADFS 2.0 servers running on Server 2008 R2, enabling them to logon to Exchange Online via SSO (Single Sign On).

Issue:
After rebooting the two ADFS servers post Windows Updates the customer could no longer login to OWA & would receive a “503 Service Unavailable” error message via IIS on the two ADFS servers.

Background:
I have to hang my head in shame with this one as I really should have figured this out sooner. Initial troubleshooting showed that the ADFSAppPool was stopped in IIS. It would start but as soon as you tried accessing it, it would stop again. Nothing at all in the Application or ADFS logs in Event Viewer (more on this poor bit of troubleshooting on my part later).The ADFS service account it was running under looked ok; the App Pool would start & so would the ADFS Service (both running under this account) so it seemed to not be a credential issue (at least I got that part right). I even went as far as to reinstall ADFS & IIS on the non-primary ADFS server in the event it was something in IIS. I was clearly out-classed on this seemingly simple issue.

Resolution:
Because the customer was down & I was scratching my head, I decided to escalate the issue to Microsoft; at which point they resolved the issue in about 5min.

Now before I say the fix I’d just like to say I consider myself a good troubleshooter. I’ve been troubleshooting all manner of Microsoft, Cisco, etc technologies for more than a decade & made a pretty successful career out of it. I even managed to pass both the MCM 2010 & MCSM 2013 lab exams on the 1st attempt; but today was not my day. I spent over 2 hrs on this & I broke the cardinal rule of troubleshooting; I overlooked the simple things. Like many of us do I started digging a hole of deep troubleshooting, expecting this to be an incredibly complex issue; I was looking at SPN’s, SQL Permissions, checking settings in Azure, etc. I should have just looked back up in the sky instead of trying to dig a hole a mile deep but only 3 ft wide, because for some idiotic reason I chose to overlook the System Event logs….

I suppose once I saw nothing in the Application or ADFS logs I just moved on quickly to the next possibility but in a few short minutes the Microsoft Engineer checked the System Logs & saw Event 5021 from IIS stating that the service account did not have Batch Logon Rights (more on the event here). This lead him to look at Group Policy settings & sure enough, there was a GPO allowing only the Domain Admins group to log on as a batch job. (Reference 1 & 2). It seems this setting took effect after the ADFS servers were rebooted post Windows Updates. Not sure how the GPO got there as this solution was working for 2 years beforehand but it certainly was ruining our day today. After the GPO was modified to allow the ADFS service account to log on as a batch job, the issue was resolved after some service restarts.

Moral of the story:
Never overlook the obvious!
 It’s the best advice I can give to anyone, anywhere, & who has to troubleshoot anything. I’d like to say this is the 1st time this has happened to me but it’s not. Overlooking typos, not checking to see if a network cable is plugged in, not checking to see if a service is started… It happens to the best of us. I suppose overlooking the simple solution is just part of the human condition…..or at least whatever condition I have….. 🙂

Configure your target URL for OWA redirect when migrating users to the cloud


 

When you migrate  a user to Office 365  you want OWA users to have a simple redirect to office 365 and not get this error:

image

Also you want to give your users an easy OWA url not http://outlook.com/owa/mysupercooldomain.com

The solution is 2 steps

  1. create a cname record that points to outlook.com ( i.e. OWA.mysupercooldomain.com = outlook.com)
  2. add that record to your organization relationship
    1. set-orginaizationrelationship –targetOwaUrl http://owa.mysupercooldomain.com/owa
  3. Give OWA.mysupercooldomain.com to your users as there new owa page

Note: the domain you create the CNAME in must be one of your federated or accepted domains in office 365 for realm discovery to work.

Problems with Federation Trust After changes to your certificate


Here is the situation and the solution

Situation

  • I Had a federated trust setup in exchange 2010 SP1 (same issue can happen in RTM)
  • I created it using the “UseLegacyProvisioningService” switch and so was using a 3rd party certificate
  • After the trust was established I had some issues with the cert… and while it’s a long story the gist is that the cert was revoked and I received a new one.
  • Well this caused an issue with my federation trust because I didn’t get the cert switched before the revocation (this can also happen if you delete the cert from the cert store or if it expires before you roll to a new one)

Symptoms: I received the following errors when I try to make any changes to the Federation trust or even try to delete it.

    An error occurred accessing Windows Live. Detailed information: "The request failed with HTTP status 403: Forbidden.".
        + CategoryInfo          : InvalidResult: (:) [Set-FederationTrust], LiveDomainServicesException
        + FullyQualifiedErrorId : 84DE3E74,Microsoft.Exchange.Management.SystemConfigurationTasks.SetFederationTrust

    Error:
    Exception has been thrown by the target of an invocation.

     

    An error occurred accessing Windows Live. Detailed information: "The request failed with HTTP status 403: Forbidden.".

    The request failed with HTTP status 403: Forbidden.
    Click here for help… http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.141).aspx?v=14.1.218.11&t=exchgf1&e=ms.exch.err.Ex1FCF67

Reason: the certificate that was used and is expected is no longer valid and so cannot be trusted on the live servers at Microsoft

Solution: Use ADSIEdit to change the cert to the new thumbprint

  1. Add the new cert as the next cert in EMC under Federation Trusts
  2. Open ADSIEDit with Domain admin Credentials
  3. Connect to Configuration naming context
  4. Browse to Domain –> Configuration –> Services –> Microsoft Exchange –> OrgName –> Federation Trusts
  5. image
  6. Rich Click on your Federation Trust in the right hand window and go to properties
  7. Scroll down until you find the key “msExchFedOrgNextPrivCertificate” (this was where my solution varied from EXPTA’s)
  8. Edit the key and select all the contents and copy, then close the key
  9. Edit the Key “msExchFedOrgPrivCertificate” and paste in your copied contents (It may be a good idea to have a copy of this keys contents before overwriting it)
  10. Close all windows
  11. re-open the EMC or EMS run your failed commands again and life is grand!

Thanks to EXPTA and Gene at Microsoft for the assist in figuring this out.