Hi Sudip,
On 02/17/2019 06:59 PM, Thomas Gleixner wrote:
On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
Hi Thomas,
On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
On Sun, 17 Feb 2019, Sudip Mukherjee wrote:
Hi Greg,
On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
<snip>
I think we have a real usecase which is triggering this error and I was
still in the middle of debugging that. But my initial analysis was
showing that the userspace thread was stuck in the indefinite loop.
=> This behaviour depends on the configuration of assert.
See glibc code in nptl/pthread_mutex_lock.c (you will encounter either
an abort due to assert or an indefinite loop):
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
/* Delay the thread indefinitely. */
while (1)
__pause_nocancel ();
I have a reliable reproducer of the problem and will setup a test
tomorrow and confirm.
There are more patches in that area and you also need a fixed glibc.
I can see 1a1fb985f2e2 ("futex: Handle early deadlock return
correctly") is already there in 4.14-stable.
Is anything else missing, other than this one?
glibc might be a problem, but lets see what can be done.
Those two are the kernel side of affairs I think.
The relevant glibc commits are:
8f9450a0b7a9e78267e8ae1ab1000ebca08e473e
=> Needed for pthread_mutex_lock / pthread_mutex_timedlock (within glibc
release 2.25)
823624bdc47f1f80109c9c52dee7939b9386d708
=> Needed for pthread_mutex_trylock (will be within next glibc release
2.30, but is backported to glibc release branches 2.25 ... 2.29)
Bye
Stefan
Thanks,
tglx