On Tue, 2022-03-01 at 21:57 +0800, Xiubo Li wrote: > On 3/1/22 9:20 PM, Jeff Layton wrote: > > On Tue, 2022-03-01 at 19:30 +0800, xiubli@xxxxxxxxxx wrote: > > > From: Xiubo Li <xiubli@xxxxxxxxxx> > > > > > > ------------[ cut here ]------------ > > > kernel BUG at fs/ceph/dir.c:537! > > > invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > > > CPU: 16 PID: 21641 Comm: ls Tainted: G E 5.17.0-rc2+ #92 > > > Hardware name: Red Hat RHEV Hypervisor, BIOS 1.11.0-2.el7 04/01/2014 > > > > > > The corresponding code in ceph_readdir() is: > > > > > > BUG_ON(rde->offset < ctx->pos); > > > > > > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > > > --- > > > fs/ceph/dir.c | 13 +++++++------ > > > fs/ceph/inode.c | 5 +++-- > > > fs/ceph/mds_client.c | 2 +- > > > 3 files changed, 11 insertions(+), 9 deletions(-) > > > > > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > > > index a449f4a07c07..6be0c1f793c2 100644 > > > --- a/fs/ceph/dir.c > > > +++ b/fs/ceph/dir.c > > > @@ -534,6 +534,13 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) > > > .ctext_len = rde->altname_len }; > > > u32 olen = oname.len; > > > > > > + err = ceph_fname_to_usr(&fname, &tname, &oname, NULL); > > > + if (err) { > > > + pr_err("%s unable to decode %.*s, got %d\n", __func__, > > > + rde->name_len, rde->name, err); > > > + goto out; > > > + } > > > + > > > BUG_ON(rde->offset < ctx->pos); > > > BUG_ON(!rde->inode.in); > > > > > > @@ -542,12 +549,6 @@ static int ceph_readdir(struct file *file, struct dir_context *ctx) > > > i, rinfo->dir_nr, ctx->pos, > > > rde->name_len, rde->name, &rde->inode.in); > > > > > > - err = ceph_fname_to_usr(&fname, &tname, &oname, NULL); > > > - if (err) { > > > - dout("Unable to decode %.*s. Skipping it.\n", rde->name_len, rde->name); > > > - continue; > > > - } > > > - > > > if (!dir_emit(ctx, oname.name, oname.len, > > > ceph_present_ino(inode->i_sb, le64_to_cpu(rde->inode.in->ino)), > > > le32_to_cpu(rde->inode.in->mode) >> 12)) { > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > > > index 8b0832271fdf..2bc2f02b84e8 100644 > > > --- a/fs/ceph/inode.c > > > +++ b/fs/ceph/inode.c > > > @@ -1898,8 +1898,9 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, > > > > > > err = ceph_fname_to_usr(&fname, &tname, &oname, &is_nokey); > > > if (err) { > > > - dout("Unable to decode %.*s. Skipping it.", rde->name_len, rde->name); > > > - continue; > > > + pr_err("%s unable to decode %.*s, got %d\n", __func__, > > > + rde->name_len, rde->name, err); > > > + goto out; > > > } > > > > > > > Is this really an improvement? > > Yeah, if we just continue without setting the rde->offset it will crash > in "BUG_ON(rde->offset < ctx->pos);" in ceph_readdir(). > > Ok. > > Suppose I have one dentry with a corrupt > > name. Do I want to fail a readdir request which might allow me to get at > > other dentries in that directory that isn't corrupt? > > It's a little hard to handle the code in ceph_readdir(): > > 503 /* search start position */ > 504 if (rinfo->dir_nr > 0) { > 505 int step, nr = rinfo->dir_nr; > 506 while (nr > 0) { > 507 step = nr >> 1; > 508 if (rinfo->dir_entries[i + step].offset < > ctx->pos) { > 509 i += step + 1; > 510 nr -= step + 1; > 511 } else { > 512 nr = step; > 513 } > 514 } > 515 } > > In this case how to set the rde->offset ? > > Yeah, that is the nasty part. I would probably just pretend that the corrupt dentry doesn't exist. Offsets are set by ceph_make_fpos, AFAICT, and I don't think skipping one should affect the position of the other. OTOH, this is just a nice-to-have thing. If it's too nasty to deal with, we can just return an error for now and aim to do better error handling here later. > > > > Maybe we should try to emit some placeholder there? > > > > > > > dname.name = oname.name; > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c > > > index 914a6e68bb56..94b4c6508044 100644 > > > --- a/fs/ceph/mds_client.c > > > +++ b/fs/ceph/mds_client.c > > > @@ -3474,7 +3474,7 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg) > > > if (err == 0) { > > > if (result == 0 && (req->r_op == CEPH_MDS_OP_READDIR || > > > req->r_op == CEPH_MDS_OP_LSSNAP)) > > > - ceph_readdir_prepopulate(req, req->r_session); > > > + err = ceph_readdir_prepopulate(req, req->r_session); > > > } > > > current->journal_info = NULL; > > > mutex_unlock(&req->r_fill_mutex); > -- Jeff Layton <jlayton@xxxxxxxxxx>