Hi Greg !
Thank you for your fast reply.
We have now deleted the PG on OSD.130 like you suggested and
started it :
ceph-s-06 # ceph-objectstore-tool --data-path
/var/lib/ceph/osd/ceph-130/ --pgid 5.9b --op remove --force
marking collection for removal
setting '_remove' omap key
finish_remove_pgs 5.9b_head removing 5.9b
Remove successful
ceph-s-06 # systemctl start ceph-osd@130.service
The cluster recovered again until it came to the PG 5.9b.
Then OSD.130
crashed again. -> No Change
So we wanted to start the other way and export the PG from
the primary
(healthy) OSD. (OSD.19) but that fails:
root@ceph-s-03:/tmp5.9b# ceph-objectstore-tool --op export
--pgid 5.9b
--data-path /var/lib/ceph/osd/ceph-19 --file
/tmp5.9b/5.9b.export
OSD has the store locked
But we don't want to stop OSD.19 on this server because this
Pool has
size=3 and size_min=2.
(this would make pg5.9b inaccessable)
I'm a bit confused. Are you saying that
1) the ceph-objectstore-tool you pasted there
successfully removed pg 5.9b from osd.130 (as it appears),
AND
2) pg 5.9b was active with one of the other nodes as
primary, so all data remained available, AND
3) when pg 5.9b got backfilled into osd.130, osd.130
crashed again? (But the other OSDs kept the PG fully
available, without crashing?)
That sequence of events is *deeply* confusing and I
really don't understand how it might happen.
Sadly I don't think you can grab a PG for export without
stopping the OSD in question.
When we query the pg, we can see a lot of "snap_trimq".
Can this be cleaned somehow, even if the pg is undersized
and degraded ?
I *think* the PG will keep trimming snapshots even if
undersized+degraded (though I don't remember for sure), but
snapshot trimming is often heavily throttled and I'm not
aware of any way to specifically push one PG to the front.
If you're interested in speeding snaptrimming up you can
search the archives or check the docs for the appropriate
config options.
-Greg