Hi folks, Recently I receive a report that whole system hung and no response after a while with I/O load. The special configuration is the dm thin-pool volume is used as the swap partition of the system. >From the crash dump, I find one task is suspicious which looks as following, PID: 462 TASK: ffff93033d74a680 CPU: 7 COMMAND: "kworker/u256:1" #0 [ffffb24b4d9c3710] __schedule at ffffffff9e29dc3d #1 [ffffb24b4d9c37a0] schedule at ffffffff9e29e0bf #2 [ffffb24b4d9c37b0] schedule_timeout at ffffffff9e2a179d #3 [ffffb24b4d9c3828] wait_for_completion at ffffffff9e29eaaa #4 [ffffb24b4d9c3878] __flush_work at ffffffff9dabb277 #5 [ffffb24b4d9c38f0] drain_all_pages at ffffffff9dc74e05 #6 [ffffb24b4d9c3920] __alloc_pages_slowpath at ffffffff9dc77279 #7 [ffffb24b4d9c3a20] __alloc_pages_nodemask at ffffffff9dc77e41 #8 [ffffb24b4d9c3a80] new_slab at ffffffff9dc99c1a #9 [ffffb24b4d9c3ae8] ___slab_alloc at ffffffff9dc9c6d9 #10 [ffffb24b4d9c3b40] exit_shadow_spine at ffffffffc08ef8cf [dm_persistent_data] #11 [ffffb24b4d9c3b50] insert at ffffffffc08edfcc [dm_persistent_data] #12 [ffffb24b4d9c3c30] sm_ll_mutate at ffffffffc08ea20e [dm_persistent_data] #13 [ffffb24b4d9c3cd8] dm_kcopyd_zero at ffffffffc03f7a39 [dm_mod] #14 [ffffb24b4d9c3ce8] schedule_zero at ffffffffc093d181 [dm_thin_pool] #15 [ffffb24b4d9c3d40] process_cell at ffffffffc093d78c [dm_thin_pool] #16 [ffffb24b4d9c3dc8] do_worker at ffffffffc093dce6 [dm_thin_pool] #17 [ffffb24b4d9c3e98] process_one_work at ffffffff9daba4d4 #18 [ffffb24b4d9c3ed8] worker_thread at ffffffff9daba6ed #19 [ffffb24b4d9c3f10] kthread at ffffffff9dac0a2d #20 [ffffb24b4d9c3f50] ret_from_fork at ffffffff9e400202 This task is writing on a thin-pool volume which is mounted as swap partition in the system. This is very suspicious, because I see the dm-thin code, all memory allocation inside from dm-thin code has explicity GFP_NOIO/GFP_NOFS or implict memalloc_noio_save(), in order to avoid deadlock in recursive memory reclaim code path. I do many testings, and confirm such issue can be reproduced in latest upstream Linux v5.11-rc5+ kernel. If I create two thin-pool volumes, one is mounted as swap, one is written by heavy I/O pressure. If anonymous pages swapping happens on the first thin-pool volume while I/O hitting on second thin-pool, after around 3 minutes the whole system gets hung and no any response and kernel information for 1 hour+ before I reset the machine. My questions are, - Can a thin-pool volume be used as swap device? - The above description is a bug, or an already know issue which should be avoided ? Thanks in advance. Coly Li -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel