Re: [PATCH 2/2] virstring: Prefer strtoll_l() and strtoull_l() whenever possible

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/19/23 14:31, Daniel P. Berrangé wrote:
> On Mon, Jun 19, 2023 at 02:03:48PM +0200, Michal Privoznik wrote:
>> It's weird that in 2023 there's no reliable and portable way to
>> parse strings, but okay.
>>
>> We started with strtol(). Except, it doesn't work the same on
>> Linux and Windows. On Windows it behaves a bit different when it
>> comes to parsing strings with base = 0. So we've switched to
>> g_ascii_strtoll() which promised to solve this. And it did.
>> Except, it introduced another problem. I don't really understand
>> why, but I've seen random test failures and all of them claimed
>> inability to parse a number (specifically <memory/> from the
>> hard coded domain XML from test driver, which IMO is just a
>> coincidence because it's the first number we parse).
> 
> What platforms are you seeing the failures on ?  All platforms or just
> on Windows, or just some other specific one?

I've tried only Linux so far. Windows is out of question.

> 
> 
>> The nature of the problem suggests there's a race condition
>> somewhere. I suspected glib but I don't know their code well to
>> prove it.
> 
> The code in question is:
> 
> guint64
> g_ascii_strtoull (const gchar *nptr,
>                   gchar      **endptr,
>                   guint        base)
> {
> #if defined(USE_XLOCALE) && defined(HAVE_STRTOULL_L)
>   return strtoull_l (nptr, endptr, base, get_C_locale ());
> #else
>   gboolean negative;
>   guint64 result;
> 
>   result = g_parse_long_long (nptr, (const gchar **) endptr, base, &negative);
> 
>   /* Return the result of the appropriate sign.  */
>   return negative ? -result : result;
> #endif
> }
> 
> gint64
> g_ascii_strtoll (const gchar *nptr,
>                  gchar      **endptr,
>                  guint        base)
> {
> #if defined(USE_XLOCALE) && defined(HAVE_STRTOLL_L)
>   return strtoll_l (nptr, endptr, base, get_C_locale ());
> #else
>   gboolean negative;
>   guint64 result;
> 
>   result = g_parse_long_long (nptr, (const gchar **) endptr, base, &negative);
> 
>   if (negative && result > (guint64) G_MININT64)
>     {
>       errno = ERANGE;
>       return G_MININT64;
>     }
>   else if (!negative && result > (guint64) G_MAXINT64)
>     {
>       errno = ERANGE;
>       return G_MAXINT64;
>     }
>   else if (negative)
>     return - (gint64) result;
>   else
>     return (gint64) result;
> #endif
> }
> 
> 
> On Linux, we should take the first branch on the #ifdef, which
> is calling strtoll_l just as your patch is switching us to.
> 
> I guess Windows would use the g_parse_long_long fallback. I don't
> see anything in that code which could be non-thread safe. It is
> just a copy of strtol cdoe from glibc.
> 
> On the Linux case get_C_locale is
> 
> static locale_t
> get_C_locale (void)
> {
>   static gsize initialized = FALSE;
>   static locale_t C_locale = NULL;
> 
>   if (g_once_init_enter (&initialized))
>     {
>       C_locale = newlocale (LC_ALL_MASK, "C", NULL);
>       g_once_init_leave (&initialized, TRUE);
>     }
> 
>   return C_locale;
> }
> 
> and only way that could fail is if  newlocale isn't threadsafe, and
> we have other code in libvirt calling newlocale at exact same time
> as the *first* call to get_C_locale.

Yeah, that's why I don't understand what's going on. Anyway, try running
tests repeatedly (you can use oneliner from the commit message) and
you'll run into the problem.

I've tried to debug the problem over weekend but was really
unsuccessful. Firstly, I suspected that glib version of pthread_once()
was broken. So I've rewritten get_C_locale() to use pthread_once() but
that didn't help. Which tends to point into direction of glibc,
supported by the fact that glibc impl of newlocale() does indeed have a
global RW lock. But I haven't found anything obviously wrong there either.

> 
>>
>> Anyway, using strtoll_l() directly and taking glib out of the
>> picture solves the issue.
> 
> I'm pretty surprised if it fixes anything ! I'm a little concerned
> we might have just masked a bug that's still hiding elsewhere.

I understand that. And if you have any idea how to get to the root of
the problem, I'm more than happy to try it.

Michal




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux