PG lock contention? CephFS metadata pool rebalance

Stefan Kooman <stefan@xxxxxx> · Fri, 20 Dec 2019 15:15:46 +0100

Hi,

Like I said in an earlier mail to this list, we re-balanced ~ 60% of the
CephFS metadata pool to NVMe backed devices. Roughly 422 M objects (1.2
Billion replicated). We have 512 PGs allocated to them. While
rebalancing we suffered from quite a few SLOW_OPS. Memory, CPU and
device IOPS capacity were not a limiting factor as far as we can see (plenty of
them available ... nowhere near max capacity). We saw quite a few
slow ops with the following events:

        "time": "2019-12-19 09:41:02.712010",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2019-12-19 09:41:02.712014",
                        "event": "waiting for rw locks"
                    },
                    {
                        "time": "2019-12-19 09:41:02.881939",
                        "event": "reached_pg"

... and this repeated 100's of times taking ~ 30 seconds to complete

Does this indicate PG lock contention?

If so ... would we need to provide more PGs to the metadata pool to avoid this?

The metadata pool is only ~ 166 MiB big ... but with loads of OMAPs ...

Most advice on PG planning is concerned with the _amount_ of data ... but the
metadata pool (and this might also be true for RGW index pools) seem to be a
special case.

Thanks for your insights,

Gr. Stefan

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx