Re: [PATCH 2/2] virstring: Prefer strtoll_l() and strtoull_l() whenever possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 20, 2023 at 01:25:23PM +0200, Michal Prívozník wrote:
> On 6/19/23 17:02, Daniel P. Berrangé wrote:
> > On Mon, Jun 19, 2023 at 02:45:57PM +0200, Michal Prívozník wrote:
> >> On 6/19/23 14:31, Daniel P. Berrangé wrote:
> >>> On Mon, Jun 19, 2023 at 02:03:48PM +0200, Michal Privoznik wrote:
> >>>> It's weird that in 2023 there's no reliable and portable way to
> >>>> parse strings, but okay.
> >>>>
> >>>> We started with strtol(). Except, it doesn't work the same on
> >>>> Linux and Windows. On Windows it behaves a bit different when it
> >>>> comes to parsing strings with base = 0. So we've switched to
> >>>> g_ascii_strtoll() which promised to solve this. And it did.
> >>>> Except, it introduced another problem. I don't really understand
> >>>> why, but I've seen random test failures and all of them claimed
> >>>> inability to parse a number (specifically <memory/> from the
> >>>> hard coded domain XML from test driver, which IMO is just a
> >>>> coincidence because it's the first number we parse).
> >>>
> >>> What platforms are you seeing the failures on ?  All platforms or just
> >>> on Windows, or just some other specific one?
> >>
> >> I've tried only Linux so far. Windows is out of question.
> > 
> > snip
> > 
> >>> On the Linux case get_C_locale is
> >>>
> >>> static locale_t
> >>> get_C_locale (void)
> >>> {
> >>>   static gsize initialized = FALSE;
> >>>   static locale_t C_locale = NULL;
> >>>
> >>>   if (g_once_init_enter (&initialized))
> >>>     {
> >>>       C_locale = newlocale (LC_ALL_MASK, "C", NULL);
> >>>       g_once_init_leave (&initialized, TRUE);
> >>>     }
> >>>
> >>>   return C_locale;
> >>> }
> >>>
> >>> and only way that could fail is if  newlocale isn't threadsafe, and
> >>> we have other code in libvirt calling newlocale at exact same time
> >>> as the *first* call to get_C_locale.
> >>
> >> Yeah, that's why I don't understand what's going on. Anyway, try running
> >> tests repeatedly (you can use oneliner from the commit message) and
> >> you'll run into the problem.
> > 
> > Yes, I've hit the problem with the error message
> > 
> >   error: XML error: Invalid value '8388608' for element or attribute './memory[1]'
> > 
> > interestingly, I've also hit a few other error messages, some failures
> > without error message (I suspect virsh simply crashed with no output),
> > and most intriguing is that it 'virshtest' simply hangs when I run it
> > directly.
> 
> <snip/>
> 
> > 
> > So the cause of the hang is exceedingly obvious.
> > 
> > Very few glib APIs are safe to uses in between fork & exec, and we are
> > (yet again) being burnt.
> 
> Yep, and switching to plain strtol() fixes this hang. But it does not
> solve ...
> 
> > 
> > What I can't explain, however, is why we sometimes get an error message
> > instead of a hang.
> 
> .. this. I bet it has something to do with fork() + exec() because when
> I set up logging and run those tests I can see which one is failing
> (with that "unable to parse NNN" message), but when I run it manually
> with the exact arguments I don't see any hiccups.

Ok, I've found the root cause. g_mutex_lock() sometimes splatters errno
on contended locks:

  https://gitlab.gnome.org/GNOME/glib/-/issues/3034

most of the time this won't matter, as the APIs have a return value
that identifies failure, and only then does the caller look at errno.
The strtoll/g_ascii_strtoll functions are special though because we
have to directly inspect errno to identify failure. The splattering
of errno breaks this :-(

IOW, as of today, the g_ascii_strtoll functions are all broken and
unusable on Linux when GLib is built to natively use  futex() instead
of pthread_mutex_t.

We can avoid this, however, by making virInitialize invoke
g_ascii_strtoll() on a dummy string. This triggers the glib
one-time initializer, and thereafter we shouldn't hit the locking
bug because g_once_init_enter will go via a fast-path that avoids
the futex syscall.

> > Third, we should move virExec and virFork and related helpers into a
> > standalone file, so that it is very clear which bits of code are running
> > in between fork+exec and thus need careful design to avoid anything
> > except async safe POSIX.
> 
> I've tried to do this, but it's bigger task than I thought. Plenty of
> dependencies across the file.

No problem, not an urgent thing. Just something I think we ought to
do at somepoint

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux