Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2023/05/05 23:47, Jove 写道:
Hi Kuai.

Jove, As I understand this, if mdadm make progress without a blocked
io, and reshape continues, it seems you can use this array without
problem

I've had to do some sleuthing to figure out who was doing that array
access, I was already running a minimal FedoraCore image. I've
discovered that the culprit is the systemd-udevd daemon. I do not know
why it accesses the array but if I stop it and rename that executable
(it gets started automatically when the array is assembled) then the
reshape continues.

Thanks for confirming this, however, I have no idea why systemd-udevd is
accessing the array.

In the meantime, I'll try to fix this deadlock, hope you don't mind a
reported-by tag.

Thanks,
Kuai

Now it is just a matter of time until the reshape is finished and I
can discover just how much data I still have :)

Thank you all for your help, I will send a last mail when I know more.

Best regards,

       Johan



On Fri, May 5, 2023 at 10:02 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2023/05/05 14:58, Wol 写道:
On 05/05/2023 02:34, Yu Kuai wrote:
I have had one case in which mdadm didn't hang and in which the
reshape continued. Sadly, I was using sparse overlay files and the
filesystem could not handle the full 4x 4TB. I had to terminate the
reshape.

This sounds like a dead end for now, normal io beyond reshape position
must wait:

raid5_make_request
   make_stripe_request
    ahead_of_reshape
     wait_woken

Not sure if I've got the wrong end of the stick, but if I've understood
correctly, that shouldn't be the case.

Reshape takes place in a window. All io *beyond* the window is allowed
to proceed normally - that part of the array has not been reshaped so
the old parameters are used.

All io *in front* of the window is allowed to proceed normally - that
part of the array has been reshaped so the new parameters are used.

io *IN* the window is paused until the window has passed. This
interruption should be short and sweet.

Yes, it's correct, and in this case reshape_safe should be the same as
reshapge_progress, and I guess io is stuck because
stripe_ahead_of_reshape() return true.

So this deadlock happens when io is blocked because of reshape, and
mddev_suspend() is waiting for this io to be done, in the meantime
reshape can't start untill mddev_suspend() returns.

Jove, As I understand this, if mdadm make progress without a blocked
io, and reshape continues, it seems you can use this array without
problem.

Thanks,
Kuai

Cheers,
Wol

.


.





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux