Re: [PATCH 2/2] virstring: Prefer strtoll_l() and strtoull_l() whenever possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/19/23 17:02, Daniel P. Berrangé wrote:
> On Mon, Jun 19, 2023 at 02:45:57PM +0200, Michal Prívozník wrote:
>> On 6/19/23 14:31, Daniel P. Berrangé wrote:
>>> On Mon, Jun 19, 2023 at 02:03:48PM +0200, Michal Privoznik wrote:
>>>> It's weird that in 2023 there's no reliable and portable way to
>>>> parse strings, but okay.
>>>>
>>>> We started with strtol(). Except, it doesn't work the same on
>>>> Linux and Windows. On Windows it behaves a bit different when it
>>>> comes to parsing strings with base = 0. So we've switched to
>>>> g_ascii_strtoll() which promised to solve this. And it did.
>>>> Except, it introduced another problem. I don't really understand
>>>> why, but I've seen random test failures and all of them claimed
>>>> inability to parse a number (specifically <memory/> from the
>>>> hard coded domain XML from test driver, which IMO is just a
>>>> coincidence because it's the first number we parse).
>>>
>>> What platforms are you seeing the failures on ?  All platforms or just
>>> on Windows, or just some other specific one?
>>
>> I've tried only Linux so far. Windows is out of question.
> 
> snip
> 
>>> On the Linux case get_C_locale is
>>>
>>> static locale_t
>>> get_C_locale (void)
>>> {
>>>   static gsize initialized = FALSE;
>>>   static locale_t C_locale = NULL;
>>>
>>>   if (g_once_init_enter (&initialized))
>>>     {
>>>       C_locale = newlocale (LC_ALL_MASK, "C", NULL);
>>>       g_once_init_leave (&initialized, TRUE);
>>>     }
>>>
>>>   return C_locale;
>>> }
>>>
>>> and only way that could fail is if  newlocale isn't threadsafe, and
>>> we have other code in libvirt calling newlocale at exact same time
>>> as the *first* call to get_C_locale.
>>
>> Yeah, that's why I don't understand what's going on. Anyway, try running
>> tests repeatedly (you can use oneliner from the commit message) and
>> you'll run into the problem.
> 
> Yes, I've hit the problem with the error message
> 
>   error: XML error: Invalid value '8388608' for element or attribute './memory[1]'
> 
> interestingly, I've also hit a few other error messages, some failures
> without error message (I suspect virsh simply crashed with no output),
> and most intriguing is that it 'virshtest' simply hangs when I run it
> directly.

<snip/>

> 
> So the cause of the hang is exceedingly obvious.
> 
> Very few glib APIs are safe to uses in between fork & exec, and we are
> (yet again) being burnt.

Yep, and switching to plain strtol() fixes this hang. But it does not
solve ...

> 
> 
> What I can't explain, however, is why we sometimes get an error message
> instead of a hang.

.. this. I bet it has something to do with fork() + exec() because when
I set up logging and run those tests I can see which one is failing
(with that "unable to parse NNN" message), but when I run it manually
with the exact arguments I don't see any hiccups.

> 
> If I modify 'virGlobalInit' to call g_ascii_strtoll("0", NULL, 10); then
> neither the hang or error messages occur, but the hang *should* still
> occurr as the mutex locking race fundamentally stil exists. I think it
> just changes the timing enough to avoid it in our case.
> 
> My only thought with the error messages is that somehow the 'locale_t'
> is getting its initialization missed somehow.
> 
>> I've tried to debug the problem over weekend but was really
>> unsuccessful. Firstly, I suspected that glib version of pthread_once()
>> was broken. So I've rewritten get_C_locale() to use pthread_once() but
>> that didn't help. Which tends to point into direction of glibc,
>> supported by the fact that glibc impl of newlocale() does indeed have a
>> global RW lock. But I haven't found anything obviously wrong there either.
> 
> If glib was actually using pthread_once() we wouldn't have a problem.
> Their g_once_init_enter/leave stuff doesn't use pthread_once() and their
> impl is not safe due to its mutex usage.
> 
> 
> Anyway, I understand why your proposed change here is avoiding problems,
> even if I don't understand the full root cause.
> 
> Ultimately I think our virExec() impl is verging on broken by design. We
> have got too much functionality in it to be safe. Especially since we
> switched from gnulib to glib, we're not so concious of what higher level
> constructs we're accidentally relying on - eg no one would guess g_ascii_strtol
> would acquire locks.
> 
> Rather than change our virstring.c impl, I'm inclined to think we need to
> be more aggressive and eliminate as much use of glib as possible from the
> virExec() and virFork() functions. Limit them to pure POSIX only.
> 
> Basically g_new/g_free are the only bits of glib I would entirely
> trust in between fork/exec, without analyznig the code of other glib
> APIs.
> 
> Specifically for this virStrToLong_i problem, I would suggest that we
> just directly call strtol from virCommandMassCloseGetFDsLinux(), and
> leave the main virStrToLong_i impl unchanged.

I've done this locally and it helped with deadlocks, but not with the
number parsing problem.

> 
> Second, we should modify the FreeBSD impl of virCommandMassClose so
> that it works on Linux, when close_range() is available. That would
> avoid calling the problem code in the first place for modern linux.

Yeah, I remember I wanted to do this when the syscall was introduced to
the Linux kernel, but for some reason didn't. BTW: glibc has closefrom()
which is just a wrapper over close_range(N, ~0U, 0); so we might get the
FreeBSD implementation for nothing.

> 
> 
> Third, we should move virExec and virFork and related helpers into a
> standalone file, so that it is very clear which bits of code are running
> in between fork+exec and thus need careful design to avoid anything
> except async safe POSIX.

I've tried to do this, but it's bigger task than I thought. Plenty of
dependencies across the file.

Michal




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux