On Sun, Sep 8, 2024 at 8:03 AM Damien Le Moal <dlemoal@xxxxxxxxxx> wrote: > > On 9/7/24 20:14, Richard W.M. Jones wrote: > > On Sat, Sep 07, 2024 at 07:02:30PM +0800, Ming Lei wrote: > >> BTW, the issue can be reproduced 100% by: > >> > >> echo "deadlock" > /sys/block/$ROOT_DISK/queue/scheduler > > This probably should be: > > echo "mq-deadline" > /sys/block/$ROOT_DISK/queue/scheduler > > and make sure that: > 1) mq-deadline is compiled as a module > 2) mq-deadline is not already used by a device (so not loaded already) > 3) The mq-deadline module file is stored on the target device of the scheduler > change > 4) The mq-deadline module file is not already cahced in the page cache. > > For (4), you may want to do a "echo 3 > /proc/sys/vm/drop_caches" before trying > to switch the scheduler. > > > > > That doesn't reproduce it for me (reliably). Although I'm not > > surprised as this bug has been _very_ tricky to reproduce! Sometimes > > I think I have a definite reproducer, only for it to go away when some > > tiny detail changes. > > > >>> This seems like the neatest (or shortest) fix so far, but doesn't it > >>> "mix up layers" by checking elv_iosched_store? > >> > >> It is just one exception for 'scheduler' sysfs attribute wrt. freezing > >> queue for storing, and the check can be done via the attribute > >> name("scheduler") too. > > > > Fair enough. > > > > Rich. > > > > -- > Damien Le Moal > Western Digital Research >