Hi Acelan,
On 8/22/23 16:13, AceLan Kao wrote:
Hello,
The issue is reproducible with IMSM metadata too, around 20% of reboot
hangs. I will try to raise the priority in the bug because it is valid
high- the base functionality of the system is affected.
Since it it reproducible from your side, is it possible to turn the
reproduce steps into a test case
given the importance?
I didn't try to reproduce it locally yet because customer was able to
bisect the regression and it pointed them to the same patch so I connected it
and asked author to take a look first. At a first glance, I wanted to get
community voice to see if it is not something obvious.
So far I know, customer is creating 3 IMSM raid arrays, one is the system
volume and do a reboot and it sporadically fails (around 20%). That is all.
I guess If all arrays are set with MD_DELETED flag, then reboot might
hang, not sure whether
below (maybe need to flush wq as well before list_del) helps or not,
just FYI.
@@ -9566,8 +9566,10 @@ static int md_notify_reboot(struct notifier_block
*this,
spin_lock(&all_mddevs_lock);
list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
- if (!mddev_get(mddev))
+ if (!mddev_get(mddev)) {
+ list_del(&mddev->all_mddevs);
continue;
+ }
My suggestion is delete the list node under this scenario, did you try
above?
I am still not able to reproduce this, probably due to differences in the
timing. Maybe we only need something like:
diff --git i/drivers/md/md.c w/drivers/md/md.c
index 5c3c19b8d509..ebb529b0faf8 100644
--- i/drivers/md/md.c
+++ w/drivers/md/md.c
@@ -9619,8 +9619,10 @@ static int md_notify_reboot(struct notifier_block
*this,
spin_lock(&all_mddevs_lock);
list_for_each_entry_safe(mddev, n, &all_mddevs, all_mddevs) {
- if (!mddev_get(mddev))
+ if (!mddev_get(mddev)) {
+ need_delay = 1;
continue;
+ }
spin_unlock(&all_mddevs_lock);
if (mddev_trylock(mddev)) {
if (mddev->pers)
Thanks,
Song
I will try to reproduce issue at Intel lab to check this.
Thanks,
Mariusz
Hi Guoqing,
Here is the command how I trigger the issue, have to do it around 10
times to make sure the issue is reproducible
echo "repair" | sudo tee /sys/class/block/md12?/md/sync_action && sudo
grub-reboot "Advanced options for Ubuntu>Ubuntu, with Linux 6.5.0-rc77
06a74159504-dirty" && head -c 1G < /dev/urandom > myfile1 && sleep 180
&& head -c 1G < /dev/urandom > myfile2 && sleep 1 && cat /proc/mdstat
&& sleep 1 && rm myfile1 &&
sudo reboot
Is the issue still reproducible with remove below from cmd?
echo "repair" | sudo tee /sys/class/block/md12?/md/sync_action
Just want to know if resync thread is related with the issue or not.
And the patch to add need_delay doesn't work.
My assumption is that mddev_get always returns NULL, so set need_delay
wouldn't help.
Thanks,
Guoqing