Re: FAILED: patch "[PATCH] dm snapshot: rework COW throttling to fix deadlock" failed to apply to 5.3-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 28, 2019 at 05:39:28AM -0400, Sasha Levin wrote:
> On Sun, Oct 27, 2019 at 04:37:28PM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> > 
> > The patch below does not apply to the 5.3-stable tree.
> > If someone wants it applied there, or to any other stable or longterm
> > tree, then please email the backport, including the original git commit
> > id to <stable@xxxxxxxxxxxxxxx>.
> > 
> > thanks,
> > 
> > greg k-h
> > 
> > ------------------ original commit in Linus's tree ------------------
> > 
> > > From b21555786f18cd77f2311ad89074533109ae3ffa Mon Sep 17 00:00:00 2001
> > From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > Date: Wed, 2 Oct 2019 06:15:53 -0400
> > Subject: [PATCH] dm snapshot: rework COW throttling to fix deadlock
> > 
> > Commit 721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
> > workqueue stalls") introduced a semaphore to limit the maximum number of
> > in-flight kcopyd (COW) jobs.
> > 
> > The implementation of this throttling mechanism is prone to a deadlock:
> > 
> > 1. One or more threads write to the origin device causing COW, which is
> >   performed by kcopyd.
> > 
> > 2. At some point some of these threads might reach the s->cow_count
> >   semaphore limit and block in down(&s->cow_count), holding a read lock
> >   on _origins_lock.
> > 
> > 3. Someone tries to acquire a write lock on _origins_lock, e.g.,
> >   snapshot_ctr(), which blocks because the threads at step (2) already
> >   hold a read lock on it.
> > 
> > 4. A COW operation completes and kcopyd runs dm-snapshot's completion
> >   callback, which ends up calling pending_complete().
> >   pending_complete() tries to resubmit any deferred origin bios. This
> >   requires acquiring a read lock on _origins_lock, which blocks.
> > 
> >   This happens because the read-write semaphore implementation gives
> >   priority to writers, meaning that as soon as a writer tries to enter
> >   the critical section, no readers will be allowed in, until all
> >   writers have completed their work.
> > 
> >   So, pending_complete() waits for the writer at step (3) to acquire
> >   and release the lock. This writer waits for the readers at step (2)
> >   to release the read lock and those readers wait for
> >   pending_complete() (the kcopyd thread) to signal the s->cow_count
> >   semaphore: DEADLOCK.
> > 
> > The above was thoroughly analyzed and documented by Nikos Tsironis as
> > part of his initial proposal for fixing this deadlock, see:
> > https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html
> > 
> > Fix this deadlock by reworking COW throttling so that it waits without
> > holding any locks. Add a variable 'in_progress' that counts how many
> > kcopyd jobs are running. A function wait_for_in_progress() will sleep if
> > 'in_progress' is over the limit. It drops _origins_lock in order to
> > avoid the deadlock.
> > 
> > Reported-by: Guruswamy Basavaiah <guru2018@xxxxxxxxx>
> > Reported-by: Nikos Tsironis <ntsironis@xxxxxxxxxxx>
> > Reviewed-by: Nikos Tsironis <ntsironis@xxxxxxxxxxx>
> > Tested-by: Nikos Tsironis <ntsironis@xxxxxxxxxxx>
> > Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
> > Cc: stable@xxxxxxxxxxxxxxx # v5.0+
> > Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
> > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> 
> Grabbing the listed dependency solved it for 5.3-4.19. For 4.14 and
> older I've also grabbed the semaphore->mutex conversion.

Ugh, I missed that it said that there.  I'll do this for 4.19, unless
you have these ready to go for when the tree "opens up" again.

thanks,

greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux