Windows Azure Management Portal is probably the most commonly used tool for management and administration of Windows Azure. There is a bunch of other tools that might be used – I’m going to write about that in the next post – however the most accessible is of course Management Portal as it is web based application. But what happens if the reporting inside the portal is not accurate and provides false information? That’s exactly what I discovered some time ago. One of my PaaS services recycled multiple times ( at least 5 ) during quite short period of time. Finally I was able to solve the problem with the cloud service hosted on that PaaS machine so wanted to check if everything is OK with the machine itself. After logging to the Management Portal I’ve noticed the following on the main page:
The cloud service was running but the red exclamation mark was a bit scary. So I checked the details of one of my PaaS instance.
The detailed information was saying that my Web server is recycling and is in unhealthy state. I logged to the server and haven’t found any suspicious entries apart from quite many errors inside WaHostBootstrapper log. The log was showing that at least few agents including Diagnostic and RemoteAccess were not able to launch. Due to the fact that my knowledge of Azure environment literally speaking is still quite low I’ve opened a ticket to Microsoft support. After few days of email sending and switching between different support teams (from India, US to France 🙂 ) I finally got few answers:
1. Errors in WaHostBootstrapper log are not source of the problem – but no one could answer my question what do they mean and should I fear for my cloud services.
2. My Web server has rebooted quite few times 🙂 According to the below log it rebooted 74 times! No idea how it could happen and no one of course was able to explain that.
<following is the last failure before the role started:>
 [xx/xx/2013 15:14:02.72] [WARN] role.Start() failed with exception: System.UnauthorizedAccessException: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))
at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
at Microsoft.WindowsAzure.GuestAgent.AppAgentRuntime.AppAgentRuntimeImpl.StartRole(String containerId, String roleInstanceId, String configFilename, CertificateBlobType certsBlobType, Byte certificatesBlob)
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.Role.Start(RoleGoalStateAssets goalAssets, ManualResetEvent stopping)
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.RoleStateExecutor.StartRole(RoleGoalStateAssets goalAssets).
<role start success at 15:17:07.42 >
3. It occurred that if your server hosted on Windows Azure will reboot at least 5 times during one hour and it will not start properly (not sure what is the definition of proper boot 🙂 ) you will be welcomed by this red exclamation mark on the main page of the Management Portal. That is of course a bug and Microsoft is still working on that. No final fix nor workaround available now.
All in all do not believe always what you see and always check with different tools/approaches.
One interesting thing is that also Cerebrata Azure Management Studio is showing that error – so the problem is somewhere deeper in Azure platform itself.
My ticket to Microsoft support is still open so I will let you know when the solution will be provided.
BTW: another interesting thing is that I did not get any date/details when I can expect fix of that problem. So it seems that there is no SLA defined for Azure bugs. The known portal issue was opened on 9/22/2013 and is still under investigation. Nice ! 🙂