Re: hungtask in dm code raised by concurrent run refresh and remove command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dne 05. 11. 24 v 13:27 wangzhiqiang (Q) napsal(a):
Hi Team,
Here's a hungtask issue occurs in the dm-snapshot scenario,
reproduce by concurrent run vgchange --refresh and dmsetup -f remove vg-snap.

             vgchange                   dmsetup                dmsetup
      table_load (load snapshot)
                              table_load snapshot to error
                                     remove snapshot
      suspend origin/cow/real
table_load(snapshot already remove)
take type_lock and issue io to cow in snapshot_ctr
                                                       table_load (wait type_lock)

[root@localhost ~]# ps aux | grep D
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     1818066  0.0  0.0      0     0 ?        D    Nov04   0:03 [kworker/3:2+ksnaphd]
root     2972729  0.5  2.1  87256 73032 pts/1    D<L  20:17   0:00 vgchange --refresh vg
root     2972761  0.0  0.3  23464 10636 pts/1    D    20:17   0:00 dmsetup -f remove vg-snap

Snapshot has remove after suspend origin/cow/real during vgchange --refresh, and then load
snapshot will take type_lock and issue io to cow in snapshot_ctr, the io process by kworker
but cow has suspend lead to hungtask in kernel.

Does we have some way to fix it?

It's like guessing from crystal ball what you were doing and what is the state of the system in use.

Usually the most info you will get from 'dmsetup info -c'

If you have there any device in suspend - it's likely blocking the progress of other commands which might be waiting on device resume.

In practice you are doing something which is not supportable in any way - you can't interfere with DM tables of those device which are being manipulated by lvm2 command (there is a good reason we use locked sections to ensure exclusive access to those devices).

To recover from case you would need to know where the lvm2 command was interfered and reaload & resume those device that are already expected to be there and funcional - and this might be non-trivial operation if you have not grabbed 'dmsetup table' state prior your interfering manipulation command - which in practice is 'replacing' any existing target with 'error' target - this can possibly create even a combination of devices that were not tested before - thus causing some unexpected code flow.

It's also good to know which kernel version you are working with - over the time many DM kernel bugs where fixed - so please make sure you are testing on 6.11 kernel.

Regards

Zdenek





[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux