On 1/13/22 1:19 AM, Gregory Farnum wrote:
On Tue, Jan 11, 2022 at 11:55 PM Venky Shankar <vshankar@xxxxxxxxxx> wrote:
+Greg
On Mon, Jan 10, 2022 at 12:08 AM 胡玮文 <sehuww@xxxxxxxxxxxxxxxx> wrote:
Hi ceph developers,
Today we got one of our OSD hosts hang on OOM. Some OSDs were flapping and eventually went down and out. The recovery caused one OSD to go full, which is used in both cephfs metadata and data pools.
The strange thing is:
* Many of our users report unexpected “Permission denied” error when creating new files
That's weird. I would expect the operation to block in the worst case.
I think some configurations or code paths end up returning odd errors
instead of ENOSPC or EQUOTA, which is probably what the cluster was
returning.
* dmesg has some strange error (see examples below). During that time, no special logs on both active MDSes.
* The above two strange things happens BEFORE the OSD got full.
Jan 09 01:27:13 gpu027 kernel: libceph: osd9 up
Jan 09 01:27:13 gpu027 kernel: libceph: osd10 up
Jan 09 01:28:55 gpu027 kernel: libceph: osd9 down
Jan 09 01:28:55 gpu027 kernel: libceph: osd10 down
Jan 09 01:32:35 gpu027 kernel: libceph: osd6 weight 0x0 (out)
Jan 09 01:32:35 gpu027 kernel: libceph: osd16 weight 0x0 (out)
Jan 09 01:34:18 gpu027 kernel: libceph: osd1 weight 0x0 (out)
Jan 09 01:39:20 gpu027 kernel: libceph: osd9 weight 0x0 (out)
Jan 09 01:39:20 gpu027 kernel: libceph: osd10 weight 0x0 (out)
All of these are just reports about changes in the OSDMap. It looks
like something pretty bad happened with your cluster (an OSD filling
up isn't good, but I didn't think on its own it would cause them to
get marked down+out), but this isn't a direct impact on the kernel
client or filesystem...
Jan 09 01:53:07 gpu027 kernel: ceph: mdsc_handle_reply got 30408991 on session mds1 not mds0
Jan 09 01:53:14 gpu027 kernel: ceph: mdsc_handle_reply got 30409829 on session mds1 not mds0
Jan 09 01:53:15 gpu027 kernel: ceph: mdsc_handle_reply got 30409925 on session mds1 not mds0
Jan 09 01:53:28 gpu027 kernel: ceph: mdsc_handle_reply got 30411416 on session mds1 not mds0
Jan 09 02:05:07 gpu027 kernel: ceph: mdsc_handle_reply got 30417742 on session mds0 not mds1
Jan 09 02:48:52 gpu027 kernel: ceph: mdsc_handle_reply got 30449177 on session mds1 not mds0
Jan 09 02:49:17 gpu027 kernel: ceph: mdsc_handle_reply got 30452750 on session mds1 not mds0
After reading the code, the replies are unexpected and just dropped. Any ideas about how this could happen? And is there anything I need to worry about? (The cluster is now recovered and looks good)
The MDS should ask the client to "forward" the operation to another
MDS if it is not the auth for an inode.
Yeah. Been a while since I was in this code but I think this will
generally happen if the MDS cluster is migrating metadata and the
client for some reason hasn't kept up on the changes — the
newly-responsible MDS might have gotten the request forwarded
directly, and replied to the client, but the client doesn't expect it
to be responsible for that op.
*IIRC* it all gets cleaned up when the client resubmits the operation
so this isn't an issue.
-Greg
Actually it's buggy in MDS, and there has one PR is fixing this.
https://github.com/ceph/ceph/pull/44229
The 'try_get_auth_inode()' is possibly have already send the forward
request to clients and the clients has changed the request->sesssion,
but in 'dispatch_client_request()' it will respond it with '-EINVAL'.
The above errors should from the second respond.
Regards
-- Xiubo
It would be interesting to see what "mds1" was doing around the
"01:53:07" timestamp. Could you gather that from the mds log?
The clients are Ubuntu 20.04 with kernel 5.11.0-43-generic. Ceph version is 16.2.7. No active MDS restarts during that time. Standby-replay MDSes did restart, which should be fixed by my PR https://github.com/ceph/ceph/pull/44501 . But I don’t know if it is related to the issue here.
Regards,
Weiwen Hu
--
Cheers,
Venky