Re: [PATCH 3/3] nodedev: Move the udevPCITranslateInit call

Erik Skultety <eskultet@xxxxxxxxxx> · Thu, 14 Dec 2017 15:40:01 +0100

On Thu, Dec 14, 2017 at 08:36:30AM -0500, John Ferlan wrote:
>
>
> On 12/14/2017 08:19 AM, Erik Skultety wrote:
> > On Sat, Dec 09, 2017 at 12:29:14PM -0500, John Ferlan wrote:
> >> If the timing is "just right", there is a possibility that the
> >> udev nodeStateInitialize conflicts with another systemd thread
> >> running an lspci command leaving both waiting for "something",
> >> but resulting in a hung libvirtd (and hung lspci thread) from
> >> which the only recovery is a reboot because killing either thread
> >> is impossible and results in a defunct libvirtd process if a
> >> SIGKILL is performed.
> >>
> >> In order to avoid this let's move where the PCI initialization
> >> is done to be where it's actually needed. Ensure we only perform
> >> the initialization once via a driver bool.  Likewise, during
> >> cleanup ensure we only call udevPCITranslateDeinit once the
> >> initialization is successful.
> >>
> >> At least a failure for this driver won't hang out the rest of the
> >> the libvirt event loop. May not make certain things usable though.
> >> Still a libvirtd restart is far easier than a host reboot.
> >
> > Is there a BZ for this or can you at least share what steps are necessary to
> > have a chance of hitting this issue? I'm asking because it sounds like we
> > should file a BZ against udev as well (possibly kernel) and a thorough
> > investigation of where the deadlock happens is necessary because I don't see a
> > any guarantee that just with a simple logic movement (and adding a trigger
> > condition) we can make disappear a race outside of our scope for good. On the
> > other hand, having to choose between a hung process requiring a host restart and
> > a hung worker thread requiring a service restart, I'd obviously opt for the
> > latter. So I'd say the next steps depend on how frequently and under what
> > circumstances (specific host devices, kernel version, etc.) this happens,
> > because to me it sounds odd how systemd and libpciaccess clash here.
> >
> > Erik
> >
>
> w/r/t: reproducing
>
> Have you ever set up virt-test or avocado-vt? Have a bit of patience to
> retry the same test multiple times only to have it trigger once when the
> moon, stars, and sun align in the perfect order during the 14th hour of
> the 3rd day of the 11th month? ;-)

Nope and to be honest you made a good job in discouraging me to even try doing
that.

>
> Seriously though, it's not very reproducible - I tried multiple ways
> without libvirtd involved. However, when it does reproduce, then things
> are hosed w/r/t a defunct libvirtd from which only a reboot resolves. By
> moving to a separate thread at least I can restart libvirtd and have a
> somewhat crippled environment that doesn't include nodedev.
>
> w/r/t: bug report
>
> Well writing that report could be a challenge say nothing about the
> indeterminable wait for a resolution to said bug. At this point, I'm not
> sure if it's Fedora related, udev related, systemd related, or libvirtd
> related. I'd have to get back into that state without this patch in
> order to attempt to gather/recall more information that could be even
> useful for said bug report.

Fair point.

>
> I don't mind holding off on these last 2 patches, but by posting them I
> was kind of hoping there might be someone out there who saw the same
> thing and might be able to give me some ideas related to helping debug
> or resolve.
>

I see, well, provided the issue is such a toughie to hit as you describe, I
think unless we have a solid reproducer (it's a shame a don't know how to use
systemtap probes they're designed for such anomalies) and know exactly what's
going on, we should defer merging patches 2,3/3.

> John

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list