Re: deep-scrub / backfilling: large amount of SLOW_OPS after upgrade to 13.2.8

Stefan Kooman <stefan@xxxxxx> · Tue, 21 Jan 2020 21:32:47 +0100

Quoting Stefan Kooman (stefan@xxxxxx):

> We can now also trigger SLOW_OPS on a bunch of OSDs when we do a "rbd du
> -p $POOL", something that has never been an issue. The images in
> the rbd pools have the following features enabled: layering,
> exclusive-lock, object-map, fast-diff, deep-flatten.
> 
> Has there anything changed in 13.2.8 that affects these kind of
> operations?

Besides the upgrade to 13.2.8 we also configged a different CRUSH rule
for our cephfs metadata pool. It currently contains 512 PGs and 437M
objects. We moved all those objects (by far the most objects of any pool
in the cluster) to a subset of the OSD nodes with NVMe. This turned out
not to be a good decision. Instead of faster responses we hit the issues
mentioned in this thread, i.e. SLOW_OPS during "rbd du", deep-scrubs and
osd worker thread timeouts. We likely introduced PG lock contention on
the OSDs, and as those OSDs were also involved in other pools the whole
cluster suffered from this. After reverting the rule the cluster behaves
way more predictable again.

Lessons learned: 
- spread the workload as much as possible.
- Rocksdb compaction / house keeping can take *a lot* of CPU (up to 13
  CPU cores per OSD!). After moving those PGs it will take hours (and a
  lot of load / slow ops) before the cluster normalizes again.

Sorry for the FUD,

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx