On Thu, 11 Jan 2024 20:05:05 +0800 Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > Echo 'idle' to "sync_action" is supposed to stop sync_thread while new > sync_thread can still start. However, currently this behaviour is not > correct, echo 'idle' will actually try to stop sync_thread and then > start a new sync_thread. And mdadm relies on this wrong behaviour in > some places. > > In kernel, if resync is not done yet, then recovery/reshape/check/repair > can't not start in the first place, and if resync is done, echo 'resync' > behaves the same as echo 'idle' for now. Hi Kuai, >From the last part I understand that in case of resync/reshape frozen thread is unblocked, not restarted. I miss some explanation about that here. So far I understand is: "Setting "resync" or "reshape" allow to continue frozen sync_thread instead restarting it. Setting "resync" if resync is done, has same effect as "idle" so it is safe." Please describe setting "reshape", I can see that you use it in one place, I think that with reshape we need to be more careful but you are the expert here, maybe it is same as "resync"? > > Hence replace echo 'idle' with echo 'resync/reshape' when trying to > continue frozed sync_thread. There should be no functional changes and > prevent regressions after fixing that echo 'idle' will start new > sync_thread in kernel. Ok, so this is kind of preparing for kernel fix. Got it. > > Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > --- I think that I understand purpose of the change. You are trying to avoid thread restarting if not needed and remove reference to incorrect "idle" usage of mdadm. Unfortunately, the changes you need to make have strong reference to kernel implementation. It requires to well describe them because blame is volatile. I would like to propose separate enum to not rely on kernel states naming, some proposals: /* So far I understand write "resync" for both cases */ SYNC_ACTION_RESYNC_START SYNC_ACTION_RESYNC_CONTINUE /* So far I understand write "reshape" for both cases * SYNC_ACTION_RESHAPE_START SYNC_ACTION_RESHAPE_CONTINUE /* Highlight known bug in comment and use "resync"? /* SYNC_ACTION_IDLE /* If needed? */ SYNC_ACTION_ABORT It needs to be handled by proper function which will have comments describing what is written to kernel and why. In userspace, I need more user/reader friendly code. I want to know what we exactly requested from kernel. In some cases we would expect to restart thread is some other cases just to continue frozen one. I would like to know what was a purpose of request in the particular case even if now the same action is used behind. Let me know what you think. Thanks, Mariusz