Hi Weiwen, sorry, I sent the output of ceph fs status after I unmounted the clients. I just wanted to see if this gets rid of the handle reset messages. The tests were done with the clients mounted. Nevertheless, here the output of a new test in sequence: [root@rit-tceph bench]# ceph fs status fs - 2 clients == RANK STATE MDS ACTIVITY DNS INOS 0 active tceph-02 Reqs: 0 /s 98.0k 15 POOL TYPE USED AVAIL fs-meta1 metadata 2053M 784G fs-meta2 data 0 784G fs-data data 0 1569G STANDBY MDS tceph-01 tceph-03 MDS version: ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable) [root@rit-tceph bench]# ls -la /mnt/adm/cephfs/ total 0 drwxrwsr-x. 3 root ansible 1 Jul 28 10:01 . drwxr-xr-x. 3 root root 20 Jul 28 09:45 .. drwxr-sr-x. 3 root ansible 1 Jul 28 10:01 data [root@rit-tceph bench]# ceph tell mds.0 dump tree '~mds0/stray0' 2022-08-07T15:55:24.199+0200 7f96897fa700 0 client.455712 ms_handle_reset on v2:10.41.24.14:6812/3943985176 2022-08-07T15:55:24.264+0200 7f968a7fc700 0 client.438997 ms_handle_reset on v2:10.41.24.14:6812/3943985176 root inode is not in cache [root@rit-tceph bench]# ceph tell mds.0 dump tree '~mdsdir/stray0' 2022-08-07T15:55:31.075+0200 7f5c4e7fc700 0 client.439009 ms_handle_reset on v2:10.41.24.14:6812/3943985176 2022-08-07T15:55:31.115+0200 7f5c4f7fe700 0 client.439015 ms_handle_reset on v2:10.41.24.14:6812/3943985176 root inode is not in cache [root@rit-tceph bench]# mount | grep ceph 10.41.24.13,10.41.24.14,10.41.24.15:/ on /mnt/adm/cephfs type ceph (rw,relatime,name=fs-admin,secret=<hidden>,acl) 10.41.24.13,10.41.24.14,10.41.24.15:/data on /mnt/cephfs type ceph (rw,relatime,name=fs,secret=<hidden>,acl) The FS root is mounted on /mnt/adm/cephfs. Apart from that, I would assume that the root inode is always in cache. If not, it should be pulled up when missing, for example, here: https://github.com/ceph/ceph/blob/main/src/mds/MDSRank.cc#L3123 . In the test CInode *in = mdcache->cache_traverse(filepath(root.c_str())); if (!in) { ss << "root inode is not in cache"; return; } it seems it would be better to pull the inode into the cache (on mds.0). I'm trying to dump the MDS cache in case that might help. Unfortunately, its just hanging: [root@ceph-mds:tceph-02 /]# ceph daemon mds.tceph-02 dump cache The command "ceph" is at 100% CPU, but I can't see any output. There is also no disk activity. I remember this command returning much faster on a mimic cluster and a really large cache. Is the command different in octopus? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: 胡 玮文 <huww98@xxxxxxxxxxx> Sent: 07 August 2022 15:52:28 To: Frank Schilder Cc: ceph-users@xxxxxxx Subject: Re: ceph mds dump tree - root inode is not in cache I see you have 0 client. Can you try just mount a client and do a “ls” in your cephfs root directory? > 在 2022年8月7日,20:38,Frank Schilder <frans@xxxxxx> 写道: > > Hi Weiwen, > > I get the following results: > > # ceph fs status > fs - 0 clients > == > RANK STATE MDS ACTIVITY DNS INOS > 0 active tceph-03 Reqs: 0 /s 997k 962k > POOL TYPE USED AVAIL > fs-meta1 metadata 6650M 780G > fs-meta2 data 0 780G > fs-data data 0 1561G > STANDBY MDS > tceph-01 > tceph-02 > MDS version: ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable) > > # ceph tell mds.0 dump tree '~mds0/stray0' > 2022-08-07T14:28:00.735+0200 7fb6827fc700 0 client.434599 ms_handle_reset on v2:10.41.24.15:6812/2903519715 > 2022-08-07T14:28:00.776+0200 7fb6837fe700 0 client.434605 ms_handle_reset on v2:10.41.24.15:6812/2903519715 > root inode is not in cache > > # ceph tell mds.0 dump tree '~mdsdir/stray0' > 2022-08-07T14:30:07.370+0200 7f364d7fa700 0 client.434623 ms_handle_reset on v2:10.41.24.15:6812/2903519715 > 2022-08-07T14:30:07.411+0200 7f364e7fc700 0 client.434629 ms_handle_reset on v2:10.41.24.15:6812/2903519715 > root inode is not in cache > > Whatever I try, it says the same: "root inode is not in cache". Are the ms_handle_reset messages possibly hinting at a problem with my installation? The MDS is the only daemon type for which these appear when I use ceph tell commands. Possibly not. I also get these messages every time. > > This is a test cluster, I can do all sorts of experiments with it. Please let me know if I can try something and pull extra information out. > > With the default settings, this is all that's in today's log after trying a couple of times, the sighup comes from logrotate: > > 2022-08-07T04:02:06.693+0200 7f7856b1c700 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 > 2022-08-07T14:27:01.298+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mdsdir/stray0} (starting...) > 2022-08-07T14:27:07.581+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mds0/stray0} (starting...) > 2022-08-07T14:27:48.976+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mds0/stray0} (starting...) > 2022-08-07T14:28:00.776+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mds0/stray0} (starting...) > 2022-08-07T14:30:07.410+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mdsdir/stray0} (starting...) > 2022-08-07T14:31:15.839+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mdsdir/stray0} (starting...) > 2022-08-07T14:31:19.900+0200 7f785731d700 1 mds.tceph-03 asok_command: dump tree {prefix=dump tree,root=~mds0/stray0} (starting...) > > Please let me know if/how I can provide more info. > > Thanks and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: 胡 玮文 <huww98@xxxxxxxxxxx> > Sent: 05 August 2022 03:43:05 > To: Frank Schilder > Cc: ceph-users@xxxxxxx > Subject: [Warning Possible spam] Re: ceph mds dump tree - root inode is not in cache > > Hi Frank, > > I have not experienced this before. Maybe mds.tceph-03 is in standby state? Could you show the output of “ceph fs status”? > > You can also try “ceph tell mds.0 …” and let ceph find the correct daemon for you. > > You may also try dumping “~mds0/stray0”. > > Weiwen Hu > >> 在 2022年8月4日,23:22,Frank Schilder <frans@xxxxxx> 写道: >> >> Hi all, >> >> I'm stuck with a very annoying problem with a ceph octopus test cluster (latest stable version). I need to investigate the contents of the MDS stray buckets and something like this should work: >> >> [root@ceph-adm:tceph-03 ~]# ceph daemon mds.tceph-03 dump tree '~mdsdir' 3 >> [root@ceph-adm:tceph-03 ~]# ceph tell mds.tceph-03 dump tree '~mdsdir/stray0' >> 2022-08-04T16:57:54.010+0200 7f3475ffb700 0 client.371437 ms_handle_reset on v2:10.41.24.15:6812/2903519715 >> 2022-08-04T16:57:54.052+0200 7f3476ffd700 0 client.371443 ms_handle_reset on v2:10.41.24.15:6812/2903519715 >> root inode is not in cache >> >> However, I either get nothing or an error message. Whatever I try, I cannot figure out how to pull the root inode into the MDS cache - if this is even the problem here. I also don't understand why the annoying ms_handle_reset messages are there. I found the second command in a script: >> >> Code line: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fhuww98%2F91cbff0782ad4f6673dcffccce731c05%23file-cephfs-reintegrate-conda-stray-py-L11&data=05%7C01%7C%7C5199ba2ee5ff483150f908da7871b2c0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637954727005239422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=r30mA5v4pLdB4g3%2BUAgMLj8Raih8aNvN9K%2BArZqr0Qg%3D&reserved=0 >> >> that came up in this conversation: https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ceph.io%2Fhyperkitty%2Flist%2Fceph-users%40ceph.io%2Fmessage%2F4TDASTSWF4UIURKUN2P7PGZZ3V5SCCEE%2F&data=05%7C01%7C%7C5199ba2ee5ff483150f908da7871b2c0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637954727005239422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6umPa%2FPV6acxZLQqsBySHd3Uqxh6QM9665kCYJzGZmk%3D&reserved=0 >> >> The only place I can find "root inode is not in cache" is https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F53597%23note-14&data=05%7C01%7C%7C5199ba2ee5ff483150f908da7871b2c0%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637954727005239422%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LCVZXwxEP1CmyjC586CgljqiX0vUZJkzENWBNps1Qlo%3D&reserved=0, where it says that the above commands should return the tree. I have ca. 1mio stray entries and they must be somewhere. mds.tceph-03 is the only active MDS. >> >> Can someone help me out here? >> >> Thanks and best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx