> Hi Pavel, > > I am working on adding this sort of a workflow into a new daxctl command > (daxctl-reconfigure-device)- this will allow changing the 'mode' of a > dax device to kmem, online the resulting memory, and with your patches, > also attempt to offline the memory, and change back to device-dax. > > In running with these patches, and testing the offlining part, I ran > into the following lockdep below. > > This is with just these three patches on top of -rc7. > > > [ +0.004886] ====================================================== > [ +0.001576] WARNING: possible circular locking dependency detected > [ +0.001506] 5.1.0-rc7+ #13 Tainted: G O > [ +0.000929] ------------------------------------------------------ > [ +0.000708] daxctl/22950 is trying to acquire lock: > [ +0.000548] 00000000f4d397f7 (kn->count#424){++++}, at: kernfs_remove_by_name_ns+0x40/0x80 > [ +0.000922] > but task is already holding lock: > [ +0.000657] 000000002aa52a9f (mem_sysfs_mutex){+.+.}, at: unregister_memory_section+0x22/0xa0 I have studied this issue, and now have a clear understanding why it happens, I am not yet sure how to fix it, so suggestions are welcomed :) Here is the problem: When we offline pages we have the following call stack: # echo offline > /sys/devices/system/memory/memory8/state ksys_write vfs_write __vfs_write kernfs_fop_write kernfs_get_active lock_acquire kn->count#122 (lock for "memory8/state" kn) sysfs_kf_write dev_attr_store state_store device_offline memory_subsys_offline memory_block_action offline_pages __offline_pages percpu_down_write down_write lock_acquire mem_hotplug_lock.rw_sem When we unbind dax0.0 we have the following stack: # echo dax0.0 > /sys/bus/dax/drivers/kmem/unbind drv_attr_store unbind_store device_driver_detach device_release_driver_internal dev_dax_kmem_remove remove_memory device_hotplug_lock try_remove_memory mem_hotplug_lock.rw_sem arch_remove_memory __remove_pages __remove_section unregister_memory_section remove_memory_section mem_sysfs_mutex unregister_memory device_unregister device_del device_remove_attrs sysfs_remove_groups sysfs_remove_group remove_files kernfs_remove_by_name kernfs_remove_by_name_ns __kernfs_remove kn->count#122 So, lockdep found the ordering issue with the above two stacks: 1. kn->count#122 -> mem_hotplug_lock.rw_sem 2. mem_hotplug_lock.rw_sem -> kn->count#122