Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent time in "Waiting for SSH" during build process #265

Open
MyroTk opened this issue Feb 27, 2023 · 2 comments
Open

Inconsistent time in "Waiting for SSH" during build process #265

MyroTk opened this issue Feb 27, 2023 · 2 comments

Comments

@MyroTk
Copy link

MyroTk commented Feb 27, 2023

While building the generic-ubuntu2204-libvirt box, I have encountered a range of time spent at the Waiting for SSH... step, anywhere between 2-4+ hours. For comparison, the generic-ubuntu1604-libvirt build consistently stays within the default 1 hour ssh timeout.

I was hoping you could provide insight as to what this step is doing and how to bring the time down/make it more consistent.

Thank you in advance.

@ladar
Copy link
Member

ladar commented Mar 1, 2023

@MyroTk that involves packer booting the guest VM using the installation ISO, and then connecting via VNC where it simulates typing in a the required "boot coomands" to kick off an automatic installation. When the installation is complete, the guest reboots, and that's when packer is able to connect using SSH.

That first phase is rather fragile, and a lot can go wrong. If the auto isntall "breaks' then the guest will simply sit idle waiting for a user to fix the problem. In the case of robotic installs, that means waiting till the timeout is hit, and aborting.

Finding and fixing issues can be tricky, so I try to keep the auto-install step as simple as possible, and do as much configuration as possible using Bash scripts after the reboot.

That said, common issues are overloaded systems messing up the timing of the boot command entry. Having IPv6 enabled, and the guest installer not supporting it properly. Mismatched virtual hardware, for example the config could be expecting a SCSI device as /dev/sda instead of an IDE one, using a different path. Same goes for the network hardware. Etc, etc.

In terms of why 16.04 is more reliable than 22.04, that's easy. Ubuntu used to release server install ISOs based on the Debian installer. And it provided a reliable automated install framework. Alas, 20.04 was the last release to provide those ISOs, so the 20.10+ configs all use the "live:" install media, and that requires using the Cloudfront autoinstall framework, which in my experience is a poorly designed piece of software with many+many flaws. In all fairness, it much newer, and being actively developed, so it may eventually have improved by the time someone reads this.

Three hints. You can easily check the CPU usage/virtual disk size (and possibly guest network usage), to see if the install is running. If those numbers don't show signs of life for an extended period, the guest is likely stuck.

Second, you can try using (or looking at) the packvnc.sh script in the repo. It's a crude script designed to try and figure out the VNC port and connect to it so you can see what's happening. Just make sure it says waiting for SSH before you do. If you connect to the console while the boot command is typing it could break the process (it depends on the hypervisor and how you connect, and I can't recall if libvirt was one of the platforms with that quirk).

Finally, Parallels/Hyper-V/VirtualBox and VMWare all have GUI managers which making connecting to the virtual console of a running VM simple, if you have trouble connecting up via a CLI. But that only let's you troubleshoot a generic non-hypervisor specific install issue... And 50% of the time, problems are specific to the platform. Unfortunately the libvirt boxes are built using standalone QEMU instances, so you can't use the GUI tools to connect up with them, and how you connect varies depending on what you have installed/setup/configured. Although on most of my systems, I believe it uses exposes a VNC port, and packer tell you that port number in the log output.

Finally, if you do find a problem, and figure out how to solve it, please submit a PR. I try to make the configs as robust as possible, but I can only test them against the host OS+hypervisor versions I have access to. And, especially with libvirt, the defaults can vary widely between distros, and QEMU versions (packer will also sometimes revise its default config, breaking things).

@MyroTk
Copy link
Author

MyroTk commented Mar 1, 2023

Thanks you for the response, will continue debugging on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants