Re: [PATCH v3] ceph: blocklist the kclient when receiving corrupted snap trace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 7, 2022 at 2:31 PM Xiubo Li <xiubli@xxxxxxxxxx> wrote:
>
>
> On 07/12/2022 21:19, Xiubo Li wrote:
> >
> > On 07/12/2022 18:59, Ilya Dryomov wrote:
> >> On Tue, Dec 6, 2022 at 1:59 PM <xiubli@xxxxxxxxxx> wrote:
> >>> From: Xiubo Li <xiubli@xxxxxxxxxx>
> >>>
> >>> When received corrupted snap trace we don't know what exactly has
> >>> happened in MDS side. And we shouldn't continue writing to OSD,
> >>> which may corrupt the snapshot contents.
> >>>
> >>> Just try to blocklist this client and If fails we need to crash the
> >>> client instead of leaving it writeable to OSDs.
> >>>
> >>> Cc: stable@xxxxxxxxxxxxxxx
> >>> URL: https://tracker.ceph.com/issues/57686
> >>> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
> >>> ---
> >>>
> >>> Thanks Aaron's feedback.
> >>>
> >>> V3:
> >>> - Fixed ERROR: spaces required around that ':' (ctx:VxW)
> >>>
> >>> V2:
> >>> - Switched to WARN() to taint the Linux kernel.
> >>>
> >>>   fs/ceph/mds_client.c |  3 ++-
> >>>   fs/ceph/mds_client.h |  1 +
> >>>   fs/ceph/snap.c       | 25 +++++++++++++++++++++++++
> >>>   3 files changed, 28 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> >>> index cbbaf334b6b8..59094944af28 100644
> >>> --- a/fs/ceph/mds_client.c
> >>> +++ b/fs/ceph/mds_client.c
> >>> @@ -5648,7 +5648,8 @@ static void mds_peer_reset(struct
> >>> ceph_connection *con)
> >>>          struct ceph_mds_client *mdsc = s->s_mdsc;
> >>>
> >>>          pr_warn("mds%d closed our session\n", s->s_mds);
> >>> -       send_mds_reconnect(mdsc, s);
> >>> +       if (!mdsc->no_reconnect)
> >>> +               send_mds_reconnect(mdsc, s);
> >>>   }
> >>>
> >>>   static void mds_dispatch(struct ceph_connection *con, struct
> >>> ceph_msg *msg)
> >>> diff --git a/fs/ceph/mds_client.h b/fs/ceph/mds_client.h
> >>> index 728b7d72bf76..8e8f0447c0ad 100644
> >>> --- a/fs/ceph/mds_client.h
> >>> +++ b/fs/ceph/mds_client.h
> >>> @@ -413,6 +413,7 @@ struct ceph_mds_client {
> >>>          atomic_t                num_sessions;
> >>>          int                     max_sessions;  /* len of sessions
> >>> array */
> >>>          int                     stopping;      /* true if shutting
> >>> down */
> >>> +       int                     no_reconnect;  /* true if snap trace
> >>> is corrupted */
> >>>
> >>>          atomic64_t              quotarealms_count; /* # realms with
> >>> quota */
> >>>          /*
> >>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> >>> index c1c452afa84d..023852b7c527 100644
> >>> --- a/fs/ceph/snap.c
> >>> +++ b/fs/ceph/snap.c
> >>> @@ -767,8 +767,10 @@ int ceph_update_snap_trace(struct
> >>> ceph_mds_client *mdsc,
> >>>          struct ceph_snap_realm *realm;
> >>>          struct ceph_snap_realm *first_realm = NULL;
> >>>          struct ceph_snap_realm *realm_to_rebuild = NULL;
> >>> +       struct ceph_client *client = mdsc->fsc->client;
> >>>          int rebuild_snapcs;
> >>>          int err = -ENOMEM;
> >>> +       int ret;
> >>>          LIST_HEAD(dirty_realms);
> >>>
> >>>          lockdep_assert_held_write(&mdsc->snap_rwsem);
> >>> @@ -885,6 +887,29 @@ int ceph_update_snap_trace(struct
> >>> ceph_mds_client *mdsc,
> >>>          if (first_realm)
> >>>                  ceph_put_snap_realm(mdsc, first_realm);
> >>>          pr_err("%s error %d\n", __func__, err);
> >>> +
> >>> +       /*
> >>> +        * When receiving a corrupted snap trace we don't know what
> >>> +        * exactly has happened in MDS side. And we shouldn't continue
> >>> +        * writing to OSD, which may corrupt the snapshot contents.
> >>> +        *
> >>> +        * Just try to blocklist this kclient and if it fails we need
> >>> +        * to crash the kclient instead of leaving it writeable.
> >> Hi Xiubo,
> >>
> >> I'm not sure I understand this "let's blocklist ourselves" concept.
> >> If the kernel client shouldn't continue writing to OSDs in this case,
> >> why not just stop issuing writes -- perhaps initiating some equivalent
> >> of a read-only remount like many local filesystems would do on I/O
> >> errors (e.g. errors=remount-ro mode)?
> >
> > The following patch seems working. Let me do more test to make sure
> > there is not further crash.
> >
> > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> > index c1c452afa84d..cd487f8a4cb5 100644
> > --- a/fs/ceph/snap.c
> > +++ b/fs/ceph/snap.c
> > @@ -767,6 +767,7 @@ int ceph_update_snap_trace(struct ceph_mds_client
> > *mdsc,
> >         struct ceph_snap_realm *realm;
> >         struct ceph_snap_realm *first_realm = NULL;
> >         struct ceph_snap_realm *realm_to_rebuild = NULL;
> > +       struct super_block *sb = mdsc->fsc->sb;
> >         int rebuild_snapcs;
> >         int err = -ENOMEM;
> >         LIST_HEAD(dirty_realms);
> > @@ -885,6 +886,9 @@ int ceph_update_snap_trace(struct ceph_mds_client
> > *mdsc,
> >         if (first_realm)
> >                 ceph_put_snap_realm(mdsc, first_realm);
> >         pr_err("%s error %d\n", __func__, err);
> > +       pr_err("Remounting filesystem read-only\n");
> > +       sb->s_flags |= SB_RDONLY;
> > +
> >         return err;
> >  }
> >
> >
> For readonly approach is also my first thought it should be, but I was
> just not very sure whether it would be the best approach.
>
> Because by evicting the kclient we could prevent the buffer to be wrote
> to OSDs. But the readonly one seems won't ?

The read-only setting is more for the VFS and the user.  Technically,
the kernel client could just stop issuing writes (i.e. OSD requests
containing a write op) and not set SB_RDONLY.  That should cover any
buffered data as well.

By employing self-blocklisting, you are shifting the responsibility
of rejecting OSD requests to the OSDs.  I'm saying that not issuing
OSD requests from a potentially busted client in the first place is
probably a better idea.  At the very least you wouldn't need to BUG
on ceph_monc_blocklist_add() errors.

Thanks,

                Ilya



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux