Monitoring is always one of the most important topics that you have to define during implementation and management of Citrix environment. Usually it is decided to use standard monitoring tool used in your company – if that is good for enterprise then why it wouldn’t work for your Citrix platform? So you set required alerts, monitoring rules and wait…Wait for the first call from end user stating that his Citrix is not working at all, that she or he cannot do daily work. And you start your work – you try to find what might be the problem. You check your Citrix XenApp or XenDesktop servers and you have no idea what might be causing problem for your end user. You check the performance charts and alerts in the enterprise monitoring tool you decided to use and you see nothing. What is more you might only have insight into “your” servers which for sure doesn’t ease finding the root cause of the problem. And whenever you ask colleague from e.g. Exchange or SAP team they state that everything is working fine on their end and that it has to be Citrix that is breaking everything. You probably know that story already – working as Citrix administrator or engineer you probably went that path many, many times. As one of my colleagues said this is THE STANDARD to blame Citrix first – and this is you who have to play role of attorney and prove that Citrix is innocent. So you dig into the infrastructure, catch traces and after long hours you find that there was a problem with Exchange or SAP that your colleagues decided not to mention or for some reason didn’t notice 🙂
30th of September has written into the history of Polish Citrix User Group (PLCUG). It was the biggest event ever and we hit the highest number of participants ever. We had a special guest from Kansas City – Jarian Gibson (CTP) who did his presentation. Read the whole meeting review here:
Special thanks to Andrew Wood and our colleagues from UK Citrix User Group. Thanks Andrew!
Windows Azure Management Portal is probably the most commonly used tool for management and administration of Windows Azure. There is a bunch of other tools that might be used – I’m going to write about that in the next post – however the most accessible is of course Management Portal as it is web based application. But what happens if the reporting inside the portal is not accurate and provides false information? That’s exactly what I discovered some time ago. One of my PaaS services recycled multiple times ( at least 5 ) during quite short period of time. Finally I was able to solve the problem with the cloud service hosted on that PaaS machine so wanted to check if everything is OK with the machine itself. After logging to the Management Portal I’ve noticed the following on the main page:
The cloud service was running but the red exclamation mark was a bit scary. So I checked the details of one of my PaaS instance.
The detailed information was saying that my Web server is recycling and is in unhealthy state. I logged to the server and haven’t found any suspicious entries apart from quite many errors inside WaHostBootstrapper log. The log was showing that at least few agents including Diagnostic and RemoteAccess were not able to launch. Due to the fact that my knowledge of Azure environment literally speaking is still quite low I’ve opened a ticket to Microsoft support. After few days of email sending and switching between different support teams (from India, US to France 🙂 ) I finally got few answers:
1. Errors in WaHostBootstrapper log are not source of the problem – but no one could answer my question what do they mean and should I fear for my cloud services.
2. My Web server has rebooted quite few times 🙂 According to the below log it rebooted 74 times! No idea how it could happen and no one of course was able to explain that.
<following is the last failure before the role started:>
 [xx/xx/2013 15:14:02.72] [WARN] role.Start() failed with exception: System.UnauthorizedAccessException: Access is denied. (Exception from HRESULT: 0x80070005 (E_ACCESSDENIED))
at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
at Microsoft.WindowsAzure.GuestAgent.AppAgentRuntime.AppAgentRuntimeImpl.StartRole(String containerId, String roleInstanceId, String configFilename, CertificateBlobType certsBlobType, Byte certificatesBlob)
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.Role.Start(RoleGoalStateAssets goalAssets, ManualResetEvent stopping)
at Microsoft.WindowsAzure.GuestAgent.ContainerStateMachine.RoleStateExecutor.StartRole(RoleGoalStateAssets goalAssets).
<role start success at 15:17:07.42 >
3. It occurred that if your server hosted on Windows Azure will reboot at least 5 times during one hour and it will not start properly (not sure what is the definition of proper boot 🙂 ) you will be welcomed by this red exclamation mark on the main page of the Management Portal. That is of course a bug and Microsoft is still working on that. No final fix nor workaround available now.
All in all do not believe always what you see and always check with different tools/approaches.
One interesting thing is that also Cerebrata Azure Management Studio is showing that error – so the problem is somewhere deeper in Azure platform itself.
My ticket to Microsoft support is still open so I will let you know when the solution will be provided.
BTW: another interesting thing is that I did not get any date/details when I can expect fix of that problem. So it seems that there is no SLA defined for Azure bugs. The known portal issue was opened on 9/22/2013 and is still under investigation. Nice ! 🙂