Hi Andrew, Thanks for merging it to mm. I see the “From” is “ Junxiao Bi via Ocfs2-devel <ocfs2-devel@xxxxxxxxxxxxxx>”, can you help fix that? Thanks, Junxiao > 在 2022年6月26日,下午1:21,Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> 写道: > > > The patch titled > Subject: Revert "ocfs2: mount shared volume without ha stack" > has been added to the -mm mm-hotfixes-unstable branch. Its filename is > revert-ocfs2-mount-shared-volume-without-ha-stack.patch > > This patch will shortly appear at > https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/revert-ocfs2-mount-shared-volume-without-ha-stack.patch > > This patch will later appear in the mm-hotfixes-unstable branch at > git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > Before you just go and hit "reply", please: > a) Consider who else should be cc'ed > b) Prefer to cc a suitable mailing list as well > c) Ideally: find the original patch on the mailing list and do a > reply-to-all to that, adding suitable additional cc's > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > The -mm tree is included into linux-next via the mm-everything > branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > and is updated there every 2-3 working days > > ------------------------------------------------------ > From: Junxiao Bi via Ocfs2-devel <ocfs2-devel@xxxxxxxxxxxxxx> > Subject: Revert "ocfs2: mount shared volume without ha stack" > Date: Fri, 3 Jun 2022 15:28:01 -0700 > > This reverts commit 912f655d78c5d4ad05eac287f23a435924df7144. > > This commit introduced a regression that can cause mount hung. The > changes in __ocfs2_find_empty_slot causes that any node with none-zero > node number can grab the slot that was already taken by node 0, so node 1 > will access the same journal with node 0, when it try to grab journal > cluster lock, it will hung because it was already acquired by node 0. > It's very easy to reproduce this, in one cluster, mount node 0 first, then > node 1, you will see the following call trace from node 1. > > [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds. > [13148.739691] Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2 > [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [13148.745846] task:mount.ocfs2 state:D stack: 0 pid:53045 ppid: 53044 flags:0x00004000 > [13148.749354] Call Trace: > [13148.750718] <TASK> > [13148.752019] ? usleep_range+0x90/0x89 > [13148.753882] __schedule+0x210/0x567 > [13148.755684] schedule+0x44/0xa8 > [13148.757270] schedule_timeout+0x106/0x13c > [13148.759273] ? __prepare_to_swait+0x53/0x78 > [13148.761218] __wait_for_common+0xae/0x163 > [13148.763144] __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2] > [13148.765780] ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2] > [13148.768312] ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2] > [13148.770968] ocfs2_journal_init+0x91/0x340 [ocfs2] > [13148.773202] ocfs2_check_volume+0x39/0x461 [ocfs2] > [13148.775401] ? iput+0x69/0xba > [13148.777047] ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2] > [13148.779646] ocfs2_fill_super+0x54b/0x853 [ocfs2] > [13148.781756] mount_bdev+0x190/0x1b7 > [13148.783443] ? ocfs2_remount+0x440/0x440 [ocfs2] > [13148.785634] legacy_get_tree+0x27/0x48 > [13148.787466] vfs_get_tree+0x25/0xd0 > [13148.789270] do_new_mount+0x18c/0x2d9 > [13148.791046] __x64_sys_mount+0x10e/0x142 > [13148.792911] do_syscall_64+0x3b/0x89 > [13148.794667] entry_SYSCALL_64_after_hwframe+0x170/0x0 > [13148.797051] RIP: 0033:0x7f2309f6e26e > [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 > [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e > [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0 > [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820 > [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420 > [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000 > [13148.816564] </TASK> > > To fix it, we can just fix __ocfs2_find_empty_slot. But original commit > introduced the feature to mount ocfs2 locally even it is cluster based, > that is a very dangerous, it can easily cause serious data corruption, > there is no way to stop other nodes mounting the fs and corrupting it. > Setup ha or other cluster-aware stack is just the cost that we have to > take for avoiding corruption, otherwise we have to do it in kernel. > > Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@xxxxxxxxxx > Fixes: 912f655d78c5("ocfs2: mount shared volume without ha stack") > Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx> > Acked-by: Joseph Qi <joseph.qi@xxxxxxxxxxxxxxxxx> > Cc: Mark Fasheh <mark@xxxxxxxxxx> > Cc: Joel Becker <jlbec@xxxxxxxxxxxx> > Cc: Changwei Ge <gechangwei@xxxxxxx> > Cc: Gang He <ghe@xxxxxxxx> > Cc: Jun Piao <piaojun@xxxxxxxxxx> > Cc: <heming.zhao@xxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > --- > > fs/ocfs2/ocfs2.h | 4 --- > fs/ocfs2/slot_map.c | 46 +++++++++++++++++------------------------- > fs/ocfs2/super.c | 21 ------------------- > 3 files changed, 20 insertions(+), 51 deletions(-) > > --- a/fs/ocfs2/ocfs2.h~revert-ocfs2-mount-shared-volume-without-ha-stack > +++ a/fs/ocfs2/ocfs2.h > @@ -277,7 +277,6 @@ enum ocfs2_mount_options > OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT = 1 << 15, /* Journal Async Commit */ > OCFS2_MOUNT_ERRORS_CONT = 1 << 16, /* Return EIO to the calling process on error */ > OCFS2_MOUNT_ERRORS_ROFS = 1 << 17, /* Change filesystem to read-only on error */ > - OCFS2_MOUNT_NOCLUSTER = 1 << 18, /* No cluster aware filesystem mount */ > }; > > #define OCFS2_OSB_SOFT_RO 0x0001 > @@ -673,8 +672,7 @@ static inline int ocfs2_cluster_o2cb_glo > > static inline int ocfs2_mount_local(struct ocfs2_super *osb) > { > - return ((osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT) > - || (osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER)); > + return (osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT); > } > > static inline int ocfs2_uses_extended_slot_map(struct ocfs2_super *osb) > --- a/fs/ocfs2/slot_map.c~revert-ocfs2-mount-shared-volume-without-ha-stack > +++ a/fs/ocfs2/slot_map.c > @@ -252,16 +252,14 @@ static int __ocfs2_find_empty_slot(struc > int i, ret = -ENOSPC; > > if ((preferred >= 0) && (preferred < si->si_num_slots)) { > - if (!si->si_slots[preferred].sl_valid || > - !si->si_slots[preferred].sl_node_num) { > + if (!si->si_slots[preferred].sl_valid) { > ret = preferred; > goto out; > } > } > > for(i = 0; i < si->si_num_slots; i++) { > - if (!si->si_slots[i].sl_valid || > - !si->si_slots[i].sl_node_num) { > + if (!si->si_slots[i].sl_valid) { > ret = i; > break; > } > @@ -456,30 +454,24 @@ int ocfs2_find_slot(struct ocfs2_super * > spin_lock(&osb->osb_lock); > ocfs2_update_slot_info(si); > > - if (ocfs2_mount_local(osb)) > - /* use slot 0 directly in local mode */ > - slot = 0; > - else { > - /* search for ourselves first and take the slot if it already > - * exists. Perhaps we need to mark this in a variable for our > - * own journal recovery? Possibly not, though we certainly > - * need to warn to the user */ > - slot = __ocfs2_node_num_to_slot(si, osb->node_num); > + /* search for ourselves first and take the slot if it already > + * exists. Perhaps we need to mark this in a variable for our > + * own journal recovery? Possibly not, though we certainly > + * need to warn to the user */ > + slot = __ocfs2_node_num_to_slot(si, osb->node_num); > + if (slot < 0) { > + /* if no slot yet, then just take 1st available > + * one. */ > + slot = __ocfs2_find_empty_slot(si, osb->preferred_slot); > if (slot < 0) { > - /* if no slot yet, then just take 1st available > - * one. */ > - slot = __ocfs2_find_empty_slot(si, osb->preferred_slot); > - if (slot < 0) { > - spin_unlock(&osb->osb_lock); > - mlog(ML_ERROR, "no free slots available!\n"); > - status = -EINVAL; > - goto bail; > - } > - } else > - printk(KERN_INFO "ocfs2: Slot %d on device (%s) was " > - "already allocated to this node!\n", > - slot, osb->dev_str); > - } > + spin_unlock(&osb->osb_lock); > + mlog(ML_ERROR, "no free slots available!\n"); > + status = -EINVAL; > + goto bail; > + } > + } else > + printk(KERN_INFO "ocfs2: Slot %d on device (%s) was already " > + "allocated to this node!\n", slot, osb->dev_str); > > ocfs2_set_slot(si, slot, osb->node_num); > osb->slot_num = slot; > --- a/fs/ocfs2/super.c~revert-ocfs2-mount-shared-volume-without-ha-stack > +++ a/fs/ocfs2/super.c > @@ -172,7 +172,6 @@ enum { > Opt_dir_resv_level, > Opt_journal_async_commit, > Opt_err_cont, > - Opt_nocluster, > Opt_err, > }; > > @@ -206,7 +205,6 @@ static const match_table_t tokens = { > {Opt_dir_resv_level, "dir_resv_level=%u"}, > {Opt_journal_async_commit, "journal_async_commit"}, > {Opt_err_cont, "errors=continue"}, > - {Opt_nocluster, "nocluster"}, > {Opt_err, NULL} > }; > > @@ -618,13 +616,6 @@ static int ocfs2_remount(struct super_bl > goto out; > } > > - tmp = OCFS2_MOUNT_NOCLUSTER; > - if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) { > - ret = -EINVAL; > - mlog(ML_ERROR, "Cannot change nocluster option on remount\n"); > - goto out; > - } > - > tmp = OCFS2_MOUNT_HB_LOCAL | OCFS2_MOUNT_HB_GLOBAL | > OCFS2_MOUNT_HB_NONE; > if ((osb->s_mount_opt & tmp) != (parsed_options.mount_opt & tmp)) { > @@ -865,7 +856,6 @@ static int ocfs2_verify_userspace_stack( > } > > if (ocfs2_userspace_stack(osb) && > - !(osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) && > strncmp(osb->osb_cluster_stack, mopt->cluster_stack, > OCFS2_STACK_LABEL_LEN)) { > mlog(ML_ERROR, > @@ -1137,11 +1127,6 @@ static int ocfs2_fill_super(struct super > osb->s_mount_opt & OCFS2_MOUNT_DATA_WRITEBACK ? "writeback" : > "ordered"); > > - if ((osb->s_mount_opt & OCFS2_MOUNT_NOCLUSTER) && > - !(osb->s_feature_incompat & OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT)) > - printk(KERN_NOTICE "ocfs2: The shared device (%s) is mounted " > - "without cluster aware mode.\n", osb->dev_str); > - > atomic_set(&osb->vol_state, VOLUME_MOUNTED); > wake_up(&osb->osb_mount_event); > > @@ -1452,9 +1437,6 @@ static int ocfs2_parse_options(struct su > case Opt_journal_async_commit: > mopt->mount_opt |= OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT; > break; > - case Opt_nocluster: > - mopt->mount_opt |= OCFS2_MOUNT_NOCLUSTER; > - break; > default: > mlog(ML_ERROR, > "Unrecognized mount option \"%s\" " > @@ -1566,9 +1548,6 @@ static int ocfs2_show_options(struct seq > if (opts & OCFS2_MOUNT_JOURNAL_ASYNC_COMMIT) > seq_printf(s, ",journal_async_commit"); > > - if (opts & OCFS2_MOUNT_NOCLUSTER) > - seq_printf(s, ",nocluster"); > - > return 0; > } > > _ > > Patches currently in -mm which might be from ocfs2-devel@xxxxxxxxxxxxxx are > > revert-ocfs2-mount-shared-volume-without-ha-stack.patch >