Dear Michael, > > Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD mapping? I meant here with crush rule replicated_host_nvme. Sorry, forgot. > Yes, the OSD was still out when the previous health report was created. Hmm, this is odd. If this is correct, then it did report a slow op even though it was out of the cluster: > from https://pastebin.com/3G3ij9ui: > [WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 8133 sec, daemons [osd.0,osd.41] have slow ops. Not sure what to make of that. It looks almost like you have a ghost osd.41. I think (some of) the slow ops you are seeing are directed to the health_metrics pool and can be ignored. If it is too annoying, you could try to find out who runs the client with IDs client.7524484 and disable it. Might be an MGR module. Looking at the data you provided and also some older threads of yours (https://www.mail-archive.com/ceph-users@xxxxxxx/msg05842.html), I start considering that we are looking at the fall-out of a past admin operation. A possibility is, that an upmap for PG 1.0 exists that conflicts with the crush rule replicated_host_nvme and, hence, prevents the assignment of OSDs to PG 1.0. For example, the upmap specifies HDDs, but the crush rule required NVMEs. This result is an empty set. I couldn't really find a simple command to list up-maps. The only non-destructive way seems to be to extract the osdmap and create a clean-up command file. The cleanup file should contain a command for every PG with an upmap. To check this, you can execute (see also https://docs.ceph.com/en/latest/man/8/osdmaptool/) # ceph osd getmap > osd.map # osdmaptool osd.map --upmap-cleanup cleanup.cmd If you do this, could you please post as usual the contents of cleanup.cmd? Also, with the OSD map of your cluster, you can simulate certain admin operations and check resulting PG mappings for pools and other things without having to touch the cluster; see https://docs.ceph.com/en/latest/man/8/osdmaptool/. To dig a little bit deeper, could you please post as usual the output of: - ceph pg 1.0 query - ceph pg 7.39d query It would also be helpful if you could post the decoded crush map. You can get the map as a txt-file as follows: # ceph osd getcrushmap -o crush-orig.bin # crushtool -d crush-orig.bin -o crush.txt and post the contents of file crush.txt. Did the slow MDS request complete by now? Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 Contents of previous messages removed. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx