Home > GestaltIT, Systems, Virtualization > ESXi 5.0–1.5 Hour Boot Time During Upgrade

ESXi 5.0–1.5 Hour Boot Time During Upgrade

I have to say, I am quite shocked that I am on the tail end of waiting 1.5 hours for an ESXi 5.0 upgrade to complete booting. Seriously… 1.5 hours.

I have been waiting for some time to get some ESXi 5.0 awesomeness going on in my environment. vCenter has been sitting on v5 for some time and I have been deploying ESXi 5 in a couple stand-alone situations without any issues. So, now that I have more compute capacity in the data center, it is time to start rolling the remaining hosts to ESXi 5… or so I thought!

I downloaded ESXi 5.0.0 Kernel 469512 a while back and have been using that on my deployments. So far, so good… until today. Update Manager configured with a baseline –> Attach –> Scan –> Remediate –> back to business. Surely, Update Manager processes should take more time than the actual upgrade. About 30 minutes after starting the process, vCenter was showing that the remediation progress was a mere 22% complete and the host was unavailable. I used my RSA (IBM’s version of HP ILO or Dell DRAC) to connect to the console. Sure enough, it was stuck at loading some kernel modules. About 20 minutes later IT WAS STILL THERE!

Restarting the host did not resolve the issue. During the ESXi 5 load screen, pressing Alt + F12 loads the kernel messages. It turns out that iSCSI was having issues loading the datastores in an acceptable amount of time. I was seeing messages similar to:

image

A little research turned me onto the following knowledgebase article in VMware’s KB: ESXi 5.x boot delays when configured for Software iSCSI (KB2007108)

To quote:

This issue occurs because ESXi 5.0 attempts to connect to all configured or known targets from all configured software iSCSI portals. If a connection fails, ESXi 5.0 retries the connection 9 times. This can lead to a lengthy iSCSI discovery process, which increases the amount of time it takes to boot an ESXi 5.0 host.

So, I have 13 iSCSI stores on that specific host and multiple iSCSI VMkernel Ports (5). So, calling the iSCSI lengthy is quite the understatement.

The knowledgebase states that the resolution is applying ESXi 5.0 Express Patch 01. Fine. I can do that. And… there is a work around described in the article that states you can reduce the number of targets and network portals. I guess that is a workaround… after you have already dealt with the issue and the ridiculously long boot.

Finally, to help mitigate the issue going forward, VMware has released a new .ISO to download that includes the patch. However, this is currently available in parallel with the buggy .ISO ON THE SAME PAGE! Seriously. Get this… the only way to determine which one to download is:

image

As a virtualization admin, I know that I am using the Software iSCSI initiator in ESXi. But, why should that even matter at all?! There is a serious flaw in the boot process in version 469512  and that should be taken offline. Just because someone is not using Software iSCSI at the current time does not mean they are not going to in the future. So, if they download the faulty .ISO, they are hosed in the future. Sounds pretty crummy to me!

My Reaction

I am quite shocked that this made it out of the Q/A process at VMware in the first place. My environment is far from complex and I expect that my usage of the ESXi 5.0 hypervisor would be within any standard testing procedure. I try to keep my environment as vanilla as possible and as close to best practices as possible. 1.5 hours for a boot definitely should have been caught before release to the general public.

Additionally, providing the option to download the faulty ISO and the fixed ISO is a complete FAIL! As mentioned on the download page, this is a special circumstance due to the nature of the issue. I would expect that if this issue is as serious as the download page makes it out to be, the faulty ISO should no longer be available. There has to be a better way!

Conclusion

I have since patched the faulty ESXi 5.0 host to the latest/safest version, 504890, and boot times are back to acceptable. I will proceed with the remainder of the upgrades using the new .ISO and have deleted all references to the old version from my environment.

I have never run into an issue like this with a VMware product in my environment and I still have all the confidence in the world that VMware products are excellent. In the scheme of things, this is a bump in the road.

Advertisements
  1. Phil
    November 29, 2011 at 9:03 am

    Hello,

    I can’t find the ESXi 5 iso with patch could you post the link ?

    thx

  2. Shawn
    December 4, 2011 at 6:12 pm

    Careful with 504890 though. I have 2 multipath iscsi environments and I am running into an issue where datastores are not accessible after a reboot. I am downgrading to 469512 and hoping that the long boot time issue does not affect me.

    • December 4, 2011 at 7:44 pm

      Hi Shawn,

      Thanks so much for the comment regarding the iSCSI patch (and revised .ISO) for ESXi 5.0. Datastores not coming back online sure sounds like a problem to me.

      I have multiple VMKernel ports (each assigned to a specific NIC) connected to iSCSI datastores without any issues on the patched 5.0 version. So, lucky for me, it is not too much of an issue.

      Good luck with trying the lower version. I hope it is not a case of choosing between the lesser of two evils. If I had to choose, in your situation (assuming proper troubleshooting), I would live with the long boot times.

      I hope everything works out well for you and, if the problem lies in the new, patched version, I hope the issue is resolved shortly. Don’t hesistate to let me (and the readers of the blog) know what happens!

      Thanks!

      ~Bill

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: