On Mon, Feb 18, 2019 at 8:26 AM Stefan Liebler <stli@xxxxxxxxxxxxx> wrote: > > Hi Sudip, > > > On 02/17/2019 06:59 PM, Thomas Gleixner wrote: > > On Sun, 17 Feb 2019, Sudip Mukherjee wrote: > > > >> Hi Thomas, > >> > >> On Sun, Feb 17, 2019 at 11:53 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >>> > >>> On Sun, 17 Feb 2019, Sudip Mukherjee wrote: > >>> > >>>> Hi Greg, > >>>> > >>>> On Mon, Dec 24, 2018 at 12:52:22PM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > >>>>> > >>>> > >> <snip> > >>>> I think we have a real usecase which is triggering this error and I was > >>>> still in the middle of debugging that. But my initial analysis was > >>>> showing that the userspace thread was stuck in the indefinite loop. > => This behaviour depends on the configuration of assert. > See glibc code in nptl/pthread_mutex_lock.c (you will encounter either > an abort due to assert or an indefinite loop): > /* ESRCH can happen only for non-robust PI mutexes where > the owner of the lock died. */ > assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust); > > /* Delay the thread indefinitely. */ > while (1) > __pause_nocancel (); > >>>> I have a reliable reproducer of the problem and will setup a test > >>>> tomorrow and confirm. > >>> > >>> There are more patches in that area and you also need a fixed glibc. > >> > >> I can see 1a1fb985f2e2 ("futex: Handle early deadlock return > >> correctly") is already there in 4.14-stable. > >> Is anything else missing, other than this one? > >> > >> glibc might be a problem, but lets see what can be done. > > > > Those two are the kernel side of affairs I think. > > > > The relevant glibc commits are: > > > > 8f9450a0b7a9e78267e8ae1ab1000ebca08e473e > => Needed for pthread_mutex_lock / pthread_mutex_timedlock (within glibc > release 2.25) > > > 823624bdc47f1f80109c9c52dee7939b9386d708 > => Needed for pthread_mutex_trylock (will be within next glibc release > 2.30, but is backported to glibc release branches 2.25 ... 2.29) Thanks. I tried with only the kernel changes and it was not resolved. Then I tried with both kernel changes and the glibc changes and I saw the problem improving significantly. But since we are using an ancient version of eglibc, I am not expecting it to get better than this. -- Regards Sudip