On 9/7/24 20:14, Richard W.M. Jones wrote: > On Sat, Sep 07, 2024 at 07:02:30PM +0800, Ming Lei wrote: >> BTW, the issue can be reproduced 100% by: >> >> echo "deadlock" > /sys/block/$ROOT_DISK/queue/scheduler This probably should be: echo "mq-deadline" > /sys/block/$ROOT_DISK/queue/scheduler and make sure that: 1) mq-deadline is compiled as a module 2) mq-deadline is not already used by a device (so not loaded already) 3) The mq-deadline module file is stored on the target device of the scheduler change 4) The mq-deadline module file is not already cahced in the page cache. For (4), you may want to do a "echo 3 > /proc/sys/vm/drop_caches" before trying to switch the scheduler. > > That doesn't reproduce it for me (reliably). Although I'm not > surprised as this bug has been _very_ tricky to reproduce! Sometimes > I think I have a definite reproducer, only for it to go away when some > tiny detail changes. > >>> This seems like the neatest (or shortest) fix so far, but doesn't it >>> "mix up layers" by checking elv_iosched_store? >> >> It is just one exception for 'scheduler' sysfs attribute wrt. freezing >> queue for storing, and the check can be done via the attribute >> name("scheduler") too. > > Fair enough. > > Rich. > -- Damien Le Moal Western Digital Research