Re: FAILED: patch "[PATCH] dm snapshot: rework COW throttling to fix deadlock" failed to apply to 5.3-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 27, 2019 at 04:37:28PM +0100, gregkh@xxxxxxxxxxxxxxxxxxx wrote:

The patch below does not apply to the 5.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable@xxxxxxxxxxxxxxx>.

thanks,

greg k-h

------------------ original commit in Linus's tree ------------------

From b21555786f18cd77f2311ad89074533109ae3ffa Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Date: Wed, 2 Oct 2019 06:15:53 -0400
Subject: [PATCH] dm snapshot: rework COW throttling to fix deadlock

Commit 721b1d98fb517a ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.

The implementation of this throttling mechanism is prone to a deadlock:

1. One or more threads write to the origin device causing COW, which is
  performed by kcopyd.

2. At some point some of these threads might reach the s->cow_count
  semaphore limit and block in down(&s->cow_count), holding a read lock
  on _origins_lock.

3. Someone tries to acquire a write lock on _origins_lock, e.g.,
  snapshot_ctr(), which blocks because the threads at step (2) already
  hold a read lock on it.

4. A COW operation completes and kcopyd runs dm-snapshot's completion
  callback, which ends up calling pending_complete().
  pending_complete() tries to resubmit any deferred origin bios. This
  requires acquiring a read lock on _origins_lock, which blocks.

  This happens because the read-write semaphore implementation gives
  priority to writers, meaning that as soon as a writer tries to enter
  the critical section, no readers will be allowed in, until all
  writers have completed their work.

  So, pending_complete() waits for the writer at step (3) to acquire
  and release the lock. This writer waits for the readers at step (2)
  to release the read lock and those readers wait for
  pending_complete() (the kcopyd thread) to signal the s->cow_count
  semaphore: DEADLOCK.

The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html

Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.

Reported-by: Guruswamy Basavaiah <guru2018@xxxxxxxxx>
Reported-by: Nikos Tsironis <ntsironis@xxxxxxxxxxx>
Reviewed-by: Nikos Tsironis <ntsironis@xxxxxxxxxxx>
Tested-by: Nikos Tsironis <ntsironis@xxxxxxxxxxx>
Fixes: 721b1d98fb51 ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@xxxxxxxxxxxxxxx # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>

Grabbing the listed dependency solved it for 5.3-4.19. For 4.14 and
older I've also grabbed the semaphore->mutex conversion.

--
Thanks,
Sasha



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux