Re: osd_pglog memory hoarding - another case

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 17 Nov 2020 11:19:32 +0100

Hi Xie,

On Tue, Nov 17, 2020 at 11:14 AM <xie.xingguo@xxxxxxxxxx> wrote:
>
> Hi Dan，
>
>
> > Given that it adds a case where the pg_log is not trimmed, I wonder if
> > there could be an unforeseen condition where `last_update_ondisk`
> > isn't being updated correctly, and therefore the osd stops trimming
> > the pg_log altogether.
>
> >
>
> > Xie or Samuel: does that sound possible?
>
>
> "b670715eb4 osd/PeeringState: do not trim pg log past last_update_ondisk"
>
>
> sounds like the culprit to me if the cluster pgs never go active and recover under min_size.

Thanks for the reply. In our case the cluster was HEALTH_OK -- all PGs
active and running for two weeks after upgrading to v14.2.11 (from
12.2.12). It took two weeks for us to notice that the pg logs were
growing without bound.

-- dan

>
>
>
> 原始邮件
> 发件人：DanvanderSter
> 收件人：Kalle Happonen;
> 抄送人：Ceph Users;谢型果10072465;Samuel Just;
> 日 期 ：2020年11月17日 16:56
> 主 题 ：Re:  osd_pglog memory hoarding - another case
> Hi Kalle,
>
> Strangely and luckily, in our case the memory explosion didn't reoccur
> after that incident. So I can mostly only offer moral support.
>
> But if this bug indeed appeared between 14.2.8 and 14.2.13, then I
> think this is suspicious:
>
>    b670715eb4 osd/PeeringState: do not trim pg log past last_update_ondisk
>
>    https://github.com/ceph/ceph/commit/b670715eb4
>
> Given that it adds a case where the pg_log is not trimmed, I wonder if
> there could be an unforeseen condition where `last_update_ondisk`
> isn't being updated correctly, and therefore the osd stops trimming
> the pg_log altogether.
>
> Xie or Samuel: does that sound possible?
>
> Cheers, Dan
>
> On Tue, Nov 17, 2020 at 9:35 AM Kalle Happonen <kalle.happonen@xxxxxx> wrote:
> >
> > Hello all,
> > wrt: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7IMIWCKIHXNULEBHVUIXQQGYUDJAO2SF/
> >
> > Yesterday we hit a problem with osd_pglog memory, similar to the thread above.
> >
> > We have a 56 node object storage (S3+SWIFT) cluster with 25 OSD disk per node. We run 8+3 EC for the data pool (metadata is on replicated nvme pool)..
> >
> > The cluster has been running fine, and (as relevant to the post) the memory usage has been stable at 100 GB / node. We've had the default pg_log of 3000. The user traffic doesn't seem to have been exceptional lately.
> >
> > Last Thursday we updated the OSDs from 14.2.8 -> 14.2.13. On Friday the memory usage on OSD nodes started to grow. On each node it grew steadily about 30 GB/day, until the servers started OOM killing OSD processes.
> >
> > After a lot of debugging we found that the pg_logs were huge. Each OSD process pg_log had grown to ~22GB, which we naturally didn't have memory for, and then the cluster was in an unstable situation. This is significantly more than the 1,5 GB in the post above. We do have ~20k pgs, which may directly affect the size.
> >
> > We've reduced the pg_log to 500, and started offline trimming it where we can, and also just waited. The pg_log size dropped to ~1,2 GB on at least some nodes, but we're  still recovering, and have a lot of ODSs down and out still.
> >
> > We're unsure if version 14.2.13 triggered this, or if the osd restarts triggered this (or something unrelated we don't see).
> >
> > This mail is mostly to figure out if there are good guesses why the pg_log size per OSD process exploded? Any technical (and moral) support is appreciated. Also, currently we're not sure if 14.2.13 triggered this, so this is also to put a data point out there for other debuggers.
> >
> > Cheers,
> > Kalle Happonen
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx