Ah! You were exactly right. All 18 of these vms are clones of #1 or #2 (and they all share a base qcow2 backing file...) - as I was setting them up, I got the two working and had been "investigating network configurations" (aka screwing with xml config directly, changing stuff in virt-manager etc till it worked right).
Then I cloned everything and manually starting them all would work, but booting the server wouldn't not start them because the bridge didn't come up first. I didn't notice that I'd changed the network setting the way I did and honestly till you pointed this out, I didn't even understand the difference between the two formats (e.g. one is managed by libvirt and the other expects the OS to bring it up first before trying to start the VMs).
# diff b a
83c83
< <interface type='bridge'>
---
> <interface type='network'>
85c85
< <source bridge='virbr0'/>
---
> <source network='winnet'/>
83c83
< <interface type='bridge'>
---
> <interface type='network'>
85c85
< <source bridge='virbr0'/>
---
> <source network='winnet'/>
Here is the config difference before and after...
All is working now as it should. Thanks very much.
Fred Clift
On Mon, Aug 26, 2024 at 12:44 PM Laine Stump <lstump@xxxxxxxxxx> wrote:
On 8/26/24 12:33 PM, Fred Clift wrote:
> I have 18 VMs that are all supposed to attach to a NAT-bridge.
>
> The bridge definition seems to be ok - it used to work when I didn't
> have any guests defined. It worked when I had only a couple guests
> defined. Now that I have 18 of them, it seems to take more than a
> minute for virbr0 to come up, and the guests all fail to autostart
> because the bridge doesn't exist yet.
Are your guest interfaces defined with "<interface type='network'> ...
<source network='default'/>"? Or are they defined with "<interface
type='bridge'> ... <source bridge='virbr0'/>"?
If it's the former, then libvirt *should* make sure that the network is
started by the time any guest needs it (perhaps that changed when we
switched to having a separate daemon for the network driver and the qemu
driver, and we just never noticed it until now?) If this isn't working
properly, we should definitely fix it.
However, if your guest configs are using the latter config, then there
is no official guarantee that the virtual network will be active (and
thus the virbr0 bridge will be available) when the guest starts.
If you can verify which method your guests are using, that could affect
the direction of investigation.
>
> The host system (AlmaLinux 9.4) has dual Xeon(R) CPU E5-2680v4 with
> plenty of cores and ram.
>
> Is there some easy way to delay the start of the guests to wait for
> virbr0 to be up? Or can I specify a boot order? I assume that
> libvirtd autostarts networks before guests but I really have no
> idea... Can I specify an autostart dependency?
>
> As a stopgap measure I turned 'autostart' off on these and made a
> systemd-run bash script that just starts the network, and then starts
> the guests all serially. It has worked 10 or so test boots in a row.
> I'd prefer not to have to use an external script if possible.
>
> Fred Clift