On Fri, 28 Feb 2025 15:56:03 +0000 Junxiao Bi <junxiao.bi@xxxxxxxxxx> wrote: > > On Feb 28, 2025, at 1:38 AM, Blazej Kucman > > <blazej.kucman@xxxxxxxxxxxxxxx> wrote: > > > > On Thu, 27 Feb 2025 14:48:17 +0800 > > Xiao Ni <xni@xxxxxxxxxx> wrote: > > > >>> On Mon, Feb 24, 2025 at 9:16 PM Blazej Kucman > >>> <blazej.kucman@xxxxxxxxxxxxxxx> wrote: > >>> > >>> On Tue, 18 Feb 2025 10:48:31 -0800 > >>> Junxiao Bi <junxiao.bi@xxxxxxxxxx> wrote: > >>> > >>>> When manager thread detects new array, it will invoke > >>>> manage_new(). For imsm array, it will further invoke > >>>> imsm_open_new(). Since commit bbab0940fa75("imsm: write bad block > >>>> log on metadata sync"), it preallocates bad block log when > >>>> opening the array, that requires increasing the mpb buffer size. > >>>> For that, imsm_open_new() invokes function > >>>> imsm_update_metadata_locally(), which first uses > >>>> imsm_prepare_update() to allocate a larger mpb buffer and store > >>>> it at "mpb->next_buf", and then invoke imsm_process_update() to > >>>> copy the content from current mpb buffer "mpb->buf" to > >>>> "mpb->next_buf", and then free the current mpb buffer and set the > >>>> new buffer as current. > >>>> > >>>> There is a small race window, when monitor thread is syncing > >>>> metadata, it gets current buffer pointer in > >>>> imsm_sync_metadata()->write_super_imsm(), but before flushing the > >>>> buffer to disk, manager thread does above switching buffer which > >>>> frees current buffer, then monitor thread will run into > >>>> use-after-free issue and could cause on-disk metadata corruption. > >>>> If system keeps running, further metadata update could fix the > >>>> corruption, because after switching buffer, the new buffer will > >>>> contain good metadata, but if panic/power cycle happens while > >>>> disk metadata is corrupted, the system will run into bootup > >>>> failure if array is used as root, otherwise the array can not be > >>>> assembled after boot if not used as root. > >>>> > >>>> This issue will not happen for imsm array with only one member > >>>> array, because the memory array has not be opened yet, monitor > >>>> thread will not do any metadata updates. > >>>> This can happen for imsm array with at lease two member array, in > >>>> the following two scenarios: > >>>> 1. Restarting mdmon process with at least two member array > >>>> This will happen during system boot up or user restart mdmon > >>>> after mdadm upgrade > >>>> 2. Adding new member array to exist imsm array with at least one > >>>> member array. > >>>> > >>>> To fix this, delay the switching buffer operation to monitor > >>>> thread. > >>>> > >>>> Fixes: bbab0940fa75("imsm: write bad block log on metadata sync") > >>>> Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx> > >>>> --- > >>>> v2 <- v1: > >>>> - address code style in manage_new() > >>>> - make lines of git log not over 75 characters > >>>> > >>>> managemon.c | 10 ++++++++-- > >>>> super-intel.c | 14 +++++++++++--- > >>>> 2 files changed, 19 insertions(+), 5 deletions(-) > >>>> > >>>> diff --git a/managemon.c b/managemon.c > >>>> index d79813282457..74b64bfc9613 100644 > >>>> --- a/managemon.c > >>>> +++ b/managemon.c > >>>> @@ -721,11 +721,12 @@ static void manage_new(struct mdstat_ent > >>>> *mdstat, > >>>> * the monitor. > >>>> */ > >>>> > >>>> + struct metadata_update *update = NULL; > >>>> struct active_array *new = NULL; > >>>> struct mdinfo *mdi = NULL, *di; > >>>> - int i, inst; > >>>> - int failed = 0; > >>>> char buf[SYSFS_MAX_BUF_SIZE]; > >>>> + int failed = 0; > >>>> + int i, inst; > >>>> > >>>> /* check if array is ready to be monitored */ > >>>> if (!mdstat->active || !mdstat->level) > >>>> @@ -824,9 +825,14 @@ static void manage_new(struct mdstat_ent > >>>> *mdstat, /* if everything checks out tell the metadata handler we > >>>> want to > >>>> * manage this instance > >>>> */ > >>>> + container->update_tail = &update; > >>>> if (!aa_ready(new) || container->ss->open_new(container, > >>>> new, inst) < 0) { > >>>> + container->update_tail = NULL; > >>>> goto error; > >>>> } else { > >>>> + if (update) > >>>> + queue_metadata_update(update); > >>>> + container->update_tail = NULL; > >>>> replace_array(container, victim, new); > >>>> if (failed) { > >>>> new->check_degraded = 1; > >>>> diff --git a/super-intel.c b/super-intel.c > >>>> index cab841980830..4988eef191da 100644 > >>>> --- a/super-intel.c > >>>> +++ b/super-intel.c > >>>> @@ -8467,12 +8467,15 @@ static int imsm_count_failed(struct > >>>> intel_super *super, struct imsm_dev *dev, return failed; > >>>> } > >>>> > >>>> +static int imsm_prepare_update(struct supertype *st, > >>>> + struct metadata_update *update); > >>>> static int imsm_open_new(struct supertype *c, struct > >>>> active_array *a, int inst) > >>>> { > >>>> struct intel_super *super = c->sb; > >>>> struct imsm_super *mpb = super->anchor; > >>>> - struct imsm_update_prealloc_bb_mem u; > >>>> + struct imsm_update_prealloc_bb_mem *u; > >>>> + struct metadata_update mu; > >>>> > >>>> if (inst >= mpb->num_raid_devs) { > >>>> pr_err("subarry index %d, out of range\n", inst); > >>>> @@ -8482,8 +8485,13 @@ static int imsm_open_new(struct supertype > >>>> *c, struct active_array *a, dprintf("imsm: open_new %d\n", inst); > >>>> a->info.container_member = inst; > >>>> > >>>> - u.type = update_prealloc_badblocks_mem; > >>>> - imsm_update_metadata_locally(c, &u, sizeof(u)); > >>>> + u = xmalloc(sizeof(*u)); > >>>> + u->type = update_prealloc_badblocks_mem; > >>>> + mu.len = sizeof(*u); > >>>> + mu.buf = (char *)u; > >>>> + imsm_prepare_update(c, &mu); > >>>> + if (c->update_tail) > >>>> + append_metadata_update(c, u, sizeof(*u)); > >>>> > >>>> return 0; > >>>> } > >>> > >>> Hi Junxiao, > >>> > >>> LGTM, Approved > >>> > >>> Thanks, > >>> Blazej > >>> > >> > >> Hi Blazej > >> > >> Have you updated the PR > >> https://github.com/md-raid-utilities/mdadm/pull/152 with this V2 > >> version? > >> > >> Regards > >> Xiao > >> > > Hi, > > > > Yes, the test result is the same as V1 but there is one note from > > checkpatch > > > > WARNING:BAD_FIXES_TAG: Please use correct Fixes: style 'Fixes: <12 > > chars of sha1> ("<title line>")' - ie: 'Fixes: ca847037fafb > > ("[V2]mdmon: imsm: fix metadata corruption when managing new > > array")' #42: Fixes: bbab0940fa75("imsm: write bad block log on > > metadata sync") > If you don’t mind, just add a space between commit id and subject > should fix it, the checkpatch seemed too strict on this. Hi, I corrected it in PR. Regards, Blazej > > Thanks, > Junxiao > > > > Regards, > > Blazej > > > >> > >