On Fri, Oct 25, 2019 at 12:24 PM Aurélien Aptel <aaptel@xxxxxxxx> wrote: > > Ronnie Sahlberg <lsahlber@xxxxxxxxxx> writes: > > This is a small update to Dave's patch to address Pavels recommendation > > that we use a helper function for the trylock/sleep loop. > > Disclamer: I have not read all the emails regarding this patch but it > is not obvious to me how replacing > > lock() > > by > > while (trylock()) > sleep() > > is fixing things, but I'm sure I'm missing something :( > Let me try to explain better. The deadlock occurs because of how rw_semaphores work in Linux. The deadlock occurred because we had: 1. thread1: down_read() and obtained the semaphore 2. thread2: down_write() blocked due to thread1 3. thread1: down_read (a second time), blocked due to thread2 Note that it is normally benign for a single thread to call down_read twice. However, in this case, another thread called down_write in between the two calls. Once one thread calls down_write, any callers of down_read will block, that is the rw_semaphore implementation in Linux. If this was not the case, we could have callers of down_read continually streaming in and starving out callers of down_write. The patch removes thread2 from blocking, so #3 will never occur, hence removing the deadlock.