Re: Deadlock when swapping a table with a dm-era target

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/2/21 5:41 PM, Zdenek Kabelac wrote:
Dne 01. 12. 21 v 18:07 Nikos Tsironis napsal(a):
Hello,

Under certain conditions, swapping a table, that includes a dm-era
target, with a new table, causes a deadlock.

This happens when a status (STATUSTYPE_INFO) or message IOCTL is blocked
in the suspended dm-era target.

dm-era executes all metadata operations in a worker thread, which stops
processing requests when the target is suspended, and resumes again when
the target is resumed.

So, running 'dmsetup status' or 'dmsetup message' for a suspended dm-era
device blocks, until the device is resumed.

This seems to be a problem on its own.

If we then load a new table to the device, while the aforementioned
dmsetup command is blocked in dm-era, and resume the device, we
deadlock.

The problem is that the 'dmsetup status' and 'dmsetup message' commands
hold a reference to the live table, i.e., they hold an SRCU read lock on
md->io_barrier, while they are blocked.

When the device is resumed, the old table is replaced with the new one
by dm_swap_table(), which ends up calling synchronize_srcu() on
md->io_barrier.

Since the blocked dmsetup command is holding the SRCU read lock, and the
old table is never resumed, 'dmsetup resume' blocks too, and we have a
deadlock.

Steps to reproduce:

1. Create device with dm-era target

   # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"

2. Suspend the device

   # dmsetup suspend eradev

3. Load new table to device, e.g., to resize the device

   # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"


Your sequence is faulty - you must always preload  new table before suspend.

Suspend&Resume should be absolutely minimal in its timing.

Also nothing should be allocating memory in suspend so that's why suspend has to be used after table line is fully loaded.


Hi Zdenek,

Thanks for the feedback. There doesn't seem to be any documentation
mentioning that loading the new table should happen before suspend, so
thanks a lot for explaining it.

Unfortunately, this isn't what causes the deadlock. The following
sequence, which loads the table before suspend, also results in a
deadlock:

1. Create device with dm-era target

   # dmsetup create eradev --table "0 1048576 era /dev/datavg/erameta /dev/datavg/eradata 8192"

2. Load new table to device, e.g., to resize the device

   # dmsetup load eradev --table "0 2097152 era /dev/datavg/erameta /dev/datavg/eradata 8192"

3. Suspend the device

   # dmsetup suspend eradev

4. Retrieve the status of the device. This blocks for the reasons I
   explained in my previous email.

   # dmsetup status eradev

5. Resume the device. This deadlocks for the reasons I explained in my
   previous email.

   # dmsetup resume eradev

6. The dmesg logs are the same as the ones I included in my previous
   email.

I have explained the reasons for the deadlock in my previous email, but
I would be more than happy to discuss them more.

I would also like your feedback on the solutions I proposed there, so I
can work on a fix.

Thanks,
Nikos.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux