Re: multiple OSD crash, unfound objects

Michael Thomas <wart@xxxxxxxxxxx> · Tue, 20 Oct 2020 16:48:36 -0500

On 10/20/20 1:18 PM, Frank Schilder wrote:
Dear Michael,

Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD mapping?

I meant here with crush rule replicated_host_nvme. Sorry, forgot.

Seems to have worked fine:

https://pastebin.com/PFgDE4J1

Yes, the OSD was still out when the previous health report was created.

Hmm, this is odd. If this is correct, then it did report a slow op even though it was out of the cluster:

from https://pastebin.com/3G3ij9ui:
[WRN] SLOW_OPS: 2 slow ops, oldest one blocked for 8133 sec, daemons [osd.0,osd.41] have slow ops.

Not sure what to make of that. It looks almost like you have a ghost osd.41.

I think (some of) the slow ops you are seeing are directed to the health_metrics pool and can be ignored. If it is too annoying, you could try to find out who runs the client with IDs client.7524484 and disable it. Might be an MGR module.

I'm also pretty certain that the slow ops are related to the health 
metrics pool, which is why I've been ignoring them.

What I'm not sure about is whether re-creating the device_health_metrics 
pool will cause any problems in the ceph cluster.

Looking at the data you provided and also some older threads of yours (https://www.mail-archive.com/ceph-users@xxxxxxx/msg05842.html), I start considering that we are looking at the fall-out of a past admin operation. A possibility is, that an upmap for PG 1.0 exists that conflicts with the crush rule replicated_host_nvme and, hence, prevents the assignment of OSDs to PG 1.0. For example, the upmap specifies HDDs, but the crush rule required NVMEs. This result is an empty set.

So var I've been unable to locate the client with the ID 7524484.  It's 
not showing up in the manager dashboard -> Filesystems page, nor in the 
output of 'ceph tell mds.ceph1 client ls'.

I'm digging through the compress logs for the past week to see if I can 
find the culprit.

I couldn't really find a simple command to list up-maps. The only non-destructive way seems to be to extract the osdmap and create a clean-up command file. The cleanup file should contain a command for every PG with an upmap. To check this, you can execute (see also https://docs.ceph.com/en/latest/man/8/osdmaptool/)

   # ceph osd getmap > osd.map
   # osdmaptool osd.map --upmap-cleanup cleanup.cmd

If you do this, could you please post as usual the contents of cleanup.cmd?

It was empty:

[root@ceph1 ~]# ceph osd getmap > osd.map
got osdmap epoch 52833

[root@ceph1 ~]# osdmaptool osd.map --upmap-cleanup cleanup.cmd
osdmaptool: osdmap file 'osd.map'
writing upmap command output to: cleanup.cmd
checking for upmap cleanups

[root@ceph1 ~]# wc cleanup.cmd
0 0 0 cleanup.cmd

Also, with the OSD map of your cluster, you can simulate certain admin operations and check resulting PG mappings for pools and other things without having to touch the cluster; see https://docs.ceph.com/en/latest/man/8/osdmaptool/.

To dig a little bit deeper, could you please post as usual the output of:

- ceph pg 1.0 query
- ceph pg 7.39d query

Oddly, it claims that it doesn't have pgid 1.0.

https://pastebin.com/pHh33Dq7

It would also be helpful if you could post the decoded crush map. You can get the map as a txt-file as follows:

   # ceph osd getcrushmap -o crush-orig.bin
   # crushtool -d crush-orig.bin -o crush.txt

and post the contents of file crush.txt.

https://pastebin.com/EtEGpWy3

Did the slow MDS request complete by now?

Nope.

--Mike
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx