----- On Jul 6, 2020, at 9:59 AM, Florian Weimer fweimer@xxxxxxxxxx wrote: > * Mathieu Desnoyers: > >> When available, use the cpu_id field from __rseq_abi on Linux to >> implement sched_getcpu(). Fall-back on the vgetcpu vDSO if >> unavailable. > > I've pushed this to glibc master, but unfortunately it looks like this > exposes a kernel bug related to affinity mask changes. > > After building and testing glibc, this > > for x in {1..2000} ; do posix/tst-affinity-static & done > > produces some “error:” lines for me: > > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 2, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > error: Unexpected CPU 138, expected 0 > > “expected 0” is a result of how the test has been written, it bails out > on the first failure, which happens with CPU ID 0. > > Smaller systems can use a smaller count than 2000 to reproduce this. It > also happens sporadically when running the glibc test suite itself > (which is why it took further testing to reveal this issue). > > I can reproduce this with the Debian 4.19.118-2+deb10u1 kernel, the > Fedora 5.6.19-300.fc32 kernel, and the Red Hat Enterprise Linux kernel > 4.18.0-193.el8 (all x86_64). > > As to the cause, I'd guess that the exit path in the sched_setaffinity > system call fails to update the rseq area, so that userspace can observe > the outdated CPU ID there. Hi Florian, We have a similar test in Linux, see tools/testing/selftests/rseq/basic_test.c. That test does not trigger this issue, even when executed repeatedly. I'll investigate further what is happening within the glibc test. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com