Re: [PATCH] ceph: fix up endian bug in managing feature bits

Ilya Dryomov <idryomov@xxxxxxxxx> · Mon, 4 May 2020 19:48:52 +0200

On Thu, Apr 30, 2020 at 3:03 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>
> On Wed, 2020-04-29 at 18:08 +0200, Ilya Dryomov wrote:
> > On Wed, Apr 29, 2020 at 5:42 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > On Wed, 2020-04-29 at 11:46 +0200, Eduard Shishkin wrote:
> > > > On 4/28/20 2:23 PM, Jeff Layton wrote:
> > > > > On Mon, 2020-04-27 at 23:46 +0200, edward6@xxxxxxxxxxxxx wrote:
> > > > > > From: Eduard Shishkin <edward6@xxxxxxxxxxxxx>
> > > > > >
> > > > > > In the function handle_session() variable @features always
> > > > > > contains little endian order of bytes. Just because the feature
> > > > > > bits are packed bytewise from left to right in
> > > > > > encode_supported_features().
> > > > > >
> > > > > > However, test_bit(), called to check features availability, assumes
> > > > > > the host order of bytes in that variable. This leads to problems on
> > > > > > big endian architectures. Specifically it is impossible to mount
> > > > > > ceph volume on s390.
> > > > > >
> > > > > > This patch adds conversion from little endian to the host order
> > > > > > of bytes, thus fixing the problem.
> > > > > >
> > > > > > Signed-off-by: Eduard Shishkin <edward6@xxxxxxxxxxxxx>
> > > > > > ---
> > > > > >   fs/ceph/mds_client.c | 4 ++--
> > > > > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > > > index 486f91f..190598d 100644
> > > > > > --- a/fs/ceph/mds_client.c
> > > > > > +++ b/fs/ceph/mds_client.c
> > > > > > @@ -3252,7 +3252,7 @@ static void handle_session(struct ceph_mds_session *session,
> > > > > >           struct ceph_mds_session_head *h;
> > > > > >           u32 op;
> > > > > >           u64 seq;
> > > > > > - unsigned long features = 0;
> > > > > > + __le64 features = 0;
> > > > > >           int wake = 0;
> > > > > >           bool blacklisted = false;
> > > > > >
> > > > > > @@ -3301,7 +3301,7 @@ static void handle_session(struct ceph_mds_session *session,
> > > > > >                   if (session->s_state == CEPH_MDS_SESSION_RECONNECTING)
> > > > > >                           pr_info("mds%d reconnect success\n", session->s_mds);
> > > > > >                   session->s_state = CEPH_MDS_SESSION_OPEN;
> > > > > > -         session->s_features = features;
> > > > > > +         session->s_features = le64_to_cpu(features);
> > > > > >                   renewed_caps(mdsc, session, 0);
> > > > > >                   wake = 1;
> > > > > >                   if (mdsc->stopping)
> > > > >
> > > > > (cc'ing Zheng since he did the original patches here)
> > > > >
> > > > > Thanks Eduard. The problem is real, but I think we can just do the
> > > > > conversion during the decode.
> > > > >
> > > > > The feature mask words sent by the MDS are 64 bits, so if it's smaller
> > > > > we can assume that it's malformed. So, I don't think we need to handle
> > > > > the case where it's smaller than 8 bytes.
> > > > >
> > > > > How about this patch instead?
> > > >
> > > > Hi Jeff,
> > > >
> > > > This also works. Please, apply.
> > > >
> > > > Thanks,
> > > > Eduard.
> > > >
> > >
> > > Thanks. Merged into ceph-client/testing branch, and should make v5.8.
> >
> > I think this is stable material.  I'll tag it and get it queued up for 5.7-rc.
> >
> > Thanks,
> >
>
> Yeah, that sounds reasonable.
>
> If you're going to send up another PR, then we might want to add these
> bugfixes currently in the testing branch to it as well:
>
> 445645c8be5f fs/ceph:fix special error code in ceph_try_get_caps()
> 591681748b56 fs/ceph:fix double unlock in handle_cap_export()
> 0e84a1ebe161 ceph: ceph_kick_flushing_caps needs the s_mutex
>
> I'm not sure that any of them need to go to stable though. We might also
> want this one though:
>
> 7b3facb61440 ceph: reset i_requested_max_size if file write is not wanted
>
> ...but it'll probably need to be reworked due to merge conflicts if we
> move it ahead of some of the cap handling cleanup patches (or we could
> just pull those in too).
>
> Zheng, do you have an opinion here? Should 7b3facb61440 go to stable?

(dropping IBM folks)

"ceph: ceph_kick_flushing_caps needs the s_mutex" doesn't apply
either because cap_dirty list was moved.  I don't see the urgency on
these TBH.

The endianness fix and two error handling fixups from Wu are now
queued up.

Thanks,

                Ilya