On 07.10.20 21:00, Wido den Hollander wrote:
On 07/10/2020 16:00, Dan van der Ster wrote:
On Wed, Oct 7, 2020 at 3:29 PM Wido den Hollander <wido@xxxxxxxx> wrote:
On 07/10/2020 14:08, Dan van der Ster wrote:
Hi all,
This morning some osds in our S3 cluster started going OOM, after
restarting them I noticed that the osd_pglog is using >1.5GB per osd.
(This is on an osd with osd_memory_target = 2GB, hosting 112PGs, all
PGs are active+clean).
[...]
Hi all,
As Wido said, our case may be a bit different.
This is still on 14.2.8. Trouble started with lots of small objects.
There were 2 Veeam buckets with more than 400M objects each,
on a pool with EC 8+3. This means that there were about 10 billion
object shards. DB space on SSD was tiny (originally built for filestore,
there was space for 25GB, i.e. only 3GB really usable as we know now).
Then OSD memory started to grow, mostly buffer_anon. Decreasing
osd_max_pg_log_entries helped (with buffer_anon!). We added RAM, only to
have more OOMs a few days later. And we realized that DB slow bytes had
started to grow, without bounds.
We could delete the objects (took several weeks), and there were no OOMs
during this time. But afterwards again growing buffer_anon.
Once I observed free memory improving when a customer was writing
heavily. So we started to write constantly (small objects to dummy
buckets). This helps with buffer_anon and also with the DB growth.
It seems that at least 14.2.8 does not trim buffer_anon periodically,
but only when writing:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/EVPELHOL4KLRJ4CKOOD2JECBMUKE4EKB/
One possible explanation (just an idea) for the large amount of
buffer_anon: DB slow bytes got spread over lots and lots of small
allocations on the HDD.
We rebuilt all OSDs with bigger DBs (31GB). And we limit the amount of
slow bytes, with manual compactions.
With the big amount of small objects gone, the cluster was still
unhealthy. Then we realized that the RGW Garbage Collector did not keep
up with the load. A possible reason for the GC backlog: customers using
features like versioning more heavily than before.
There are high refcounts in GC, and there were times with lots of HEAD
requests from some customers.
GC load is mostly read load. Combined with only low write activity, this
may be problematic.
We tuned GC up and the backlog is going down now, slowly (again this
takes weeks).
Cheers
Harry
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx