Hi Peter,
Not sure if you have got to the bottom of your problem,
but I seem to have found what might be a similar problem. I
recommend reading below, as there could be a potential hidden
problem.
Yesterday our cluster went into HEALTH_WARN state and
I noticed that one of my pg's was listed as 'activating'
and marked as 'inactive' and 'unclean'.
We also have a mixed OSD system - 768 HDDs and 16 NVMEs
with three crush rules for object placement: the default replicated_rule (I
never deleted it) and then two new ones for replicate_rule_hdd
and replicate_rule_nvme.
Running a query on the pg (in my case pg 15.792) did not
yield anything out of place, except for it telling me that
that it's state was '
activating' (that's not even a pg
state:
pg states) and
made me slightly alarmed.
The bits of information that alerted me to the issue where:
1. Running 'ceph pg dump'
and finding the 'activating' pg showed the following
information:
15.792 activating
[4,724,242] #for pool 15 pg there are osds 4,724,242
2. Running 'ceph osd
tree | grep 'osd.4 ' and getting the following information:
4 nvme osd.4
3. Now checking what pool 15 is by running 'ceph osd pool ls detail':
pool 15
'default.rgw.data' replicated size 3 min_size 2
crush_rule 1
These three bits of information made me realise what was
going on:
- OSD 4,724,242 are all nvmes
- Pool 15 should obey crush_rule 1 (replicate_rule_hdd)
- Pool 15 has pgs that use nvmes!
So it turns out in my case pool 15 has osds in all the
nvmes!
To test a fix to mimic the problem again - I executed the
following command: 'ceph osd
pg-upmap-items 15.792 4 22 724 67 76 242'
It remap the osds
used by the 'activating' pg and my cluster status when back
to HEALTH_OK and the pg went back to normal making
the cluster appear healthy.
Luckily for me we've not put the cluster into production so
I'll just blow away the pool and recreate it.
What I've not yet
figured out is how this happened.
The steps (I
think) I took where:
- Run ceph-ansible and 'default.rgw.data' pool was created
automatically.
- I think I then
increased the pg count.
- Create a new rule: ceph
osd crush rule create-replicated replicated_rule_hdd
default host hdd
- Move pool to
new rule: ceph
osd pool set default.rgw.data crush_rule
replicated_rule_hdd
I don't know what the expected behaviour of the set command
is, so I'm planing to see if I can recreate the problem on a
test cluster to see which part of the process created the
problem. Perhaps I should have first migrated to the new rule
before increasing the pgs.
Regards,
Tom