On Tue, Jan 11, 2022 at 11:55 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote: > > +Greg > > On Mon, Jan 10, 2022 at 12:08 AM 胡玮文 <sehuww@xxxxxxxxxxxxxxxx> wrote: > > > > Hi ceph developers, > > > > Today we got one of our OSD hosts hang on OOM. Some OSDs were flapping and eventually went down and out. The recovery caused one OSD to go full, which is used in both cephfs metadata and data pools. > > > > The strange thing is: > > * Many of our users report unexpected “Permission denied” error when creating new files > > That's weird. I would expect the operation to block in the worst case. I think some configurations or code paths end up returning odd errors instead of ENOSPC or EQUOTA, which is probably what the cluster was returning. > > > * dmesg has some strange error (see examples below). During that time, no special logs on both active MDSes. > > * The above two strange things happens BEFORE the OSD got full. > > > > Jan 09 01:27:13 gpu027 kernel: libceph: osd9 up > > Jan 09 01:27:13 gpu027 kernel: libceph: osd10 up > > Jan 09 01:28:55 gpu027 kernel: libceph: osd9 down > > Jan 09 01:28:55 gpu027 kernel: libceph: osd10 down > > Jan 09 01:32:35 gpu027 kernel: libceph: osd6 weight 0x0 (out) > > Jan 09 01:32:35 gpu027 kernel: libceph: osd16 weight 0x0 (out) > > Jan 09 01:34:18 gpu027 kernel: libceph: osd1 weight 0x0 (out) > > Jan 09 01:39:20 gpu027 kernel: libceph: osd9 weight 0x0 (out) > > Jan 09 01:39:20 gpu027 kernel: libceph: osd10 weight 0x0 (out) All of these are just reports about changes in the OSDMap. It looks like something pretty bad happened with your cluster (an OSD filling up isn't good, but I didn't think on its own it would cause them to get marked down+out), but this isn't a direct impact on the kernel client or filesystem... > > Jan 09 01:53:07 gpu027 kernel: ceph: mdsc_handle_reply got 30408991 on session mds1 not mds0 > > Jan 09 01:53:14 gpu027 kernel: ceph: mdsc_handle_reply got 30409829 on session mds1 not mds0 > > Jan 09 01:53:15 gpu027 kernel: ceph: mdsc_handle_reply got 30409925 on session mds1 not mds0 > > Jan 09 01:53:28 gpu027 kernel: ceph: mdsc_handle_reply got 30411416 on session mds1 not mds0 > > Jan 09 02:05:07 gpu027 kernel: ceph: mdsc_handle_reply got 30417742 on session mds0 not mds1 > > Jan 09 02:48:52 gpu027 kernel: ceph: mdsc_handle_reply got 30449177 on session mds1 not mds0 > > Jan 09 02:49:17 gpu027 kernel: ceph: mdsc_handle_reply got 30452750 on session mds1 not mds0 > > > > After reading the code, the replies are unexpected and just dropped. Any ideas about how this could happen? And is there anything I need to worry about? (The cluster is now recovered and looks good) > > The MDS should ask the client to "forward" the operation to another > MDS if it is not the auth for an inode. Yeah. Been a while since I was in this code but I think this will generally happen if the MDS cluster is migrating metadata and the client for some reason hasn't kept up on the changes — the newly-responsible MDS might have gotten the request forwarded directly, and replied to the client, but the client doesn't expect it to be responsible for that op. *IIRC* it all gets cleaned up when the client resubmits the operation so this isn't an issue. -Greg > > It would be interesting to see what "mds1" was doing around the > "01:53:07" timestamp. Could you gather that from the mds log? > > > > > The clients are Ubuntu 20.04 with kernel 5.11.0-43-generic. Ceph version is 16.2.7. No active MDS restarts during that time. Standby-replay MDSes did restart, which should be fixed by my PR https://github.com/ceph/ceph/pull/44501 . But I don’t know if it is related to the issue here. > > > > Regards, > > Weiwen Hu > > > > -- > Cheers, > Venky >