On Tue, Nov 17, 2020 at 11:45 AM Kalle Happonen <kalle.happonen@xxxxxx> wrote: > > Hi Dan @ co., > Thanks for the support (moral and technical). > > That sounds like a good guess, but it seems like there is nothing alarming here. In all our pools, some pgs are a bit over 3100, but not at any exceptional values. > > cat pgdumpfull.txt | jq '.pg_map.pg_stats[] | > select(.ondisk_log_size > 3100)' | egrep "pgid|ondisk_log_size" > "pgid": "37.2b9", > "ondisk_log_size": 3103, > "pgid": "33.e", > "ondisk_log_size": 3229, > "pgid": "7.2", > "ondisk_log_size": 3111, > "pgid": "26.4", > "ondisk_log_size": 3185, > "pgid": "33.4", > "ondisk_log_size": 3311, > "pgid": "33.8", > "ondisk_log_size": 3278, > > I also have no idea what the average size of a pg log entry should be, in our case it seems it's around 8 MB (22GB/3000 entires). I also have no idea how large the average PG log entry *should* be. (BTW I think you forgot a factor which is the number of PGs on each OSD.). Here's a sample from one of our S3 4+2 OSDs: 71 PGs, "osd_pglog": { "items": 249530, "bytes": 33925360 }, So that's ~32MB for roughly 500*71 entries == around 1kB each. Anyway you raised a good point -- this isn't necessarily a "pg log not trimming" bug, but rather it might be a "pg log entries are huge" bug. -- dan > > Cheers, > Kalle > > ----- Original Message ----- > > From: "Dan van der Ster" <dan@xxxxxxxxxxxxxx> > > To: "Kalle Happonen" <kalle.happonen@xxxxxx> > > Cc: "ceph-users" <ceph-users@xxxxxxx>, "xie xingguo" <xie.xingguo@xxxxxxxxxx>, "Samuel Just" <sjust@xxxxxxxxxx> > > Sent: Tuesday, 17 November, 2020 12:22:28 > > Subject: Re: osd_pglog memory hoarding - another case > > > Hi Kalle, > > > > Do you have active PGs now with huge pglogs? > > You can do something like this to find them: > > > > ceph pg dump -f json | jq '.pg_map.pg_stats[] | > > select(.ondisk_log_size > 3000)' > > > > If you find some, could you increase to debug_osd = 10 then share the osd log. > > I am interested in the debug lines from calc_trim_to_aggressively (or > > calc_trim_to if you didn't enable pglog_hardlimit), but the whole log > > might show other issues. > > > > Cheers, dan > > > > > > On Tue, Nov 17, 2020 at 9:55 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > >> > >> Hi Kalle, > >> > >> Strangely and luckily, in our case the memory explosion didn't reoccur > >> after that incident. So I can mostly only offer moral support. > >> > >> But if this bug indeed appeared between 14.2.8 and 14.2.13, then I > >> think this is suspicious: > >> > >> b670715eb4 osd/PeeringState: do not trim pg log past last_update_ondisk > >> > >> https://github.com/ceph/ceph/commit/b670715eb4 > >> > >> Given that it adds a case where the pg_log is not trimmed, I wonder if > >> there could be an unforeseen condition where `last_update_ondisk` > >> isn't being updated correctly, and therefore the osd stops trimming > >> the pg_log altogether. > >> > >> Xie or Samuel: does that sound possible? > >> > >> Cheers, Dan > >> > >> On Tue, Nov 17, 2020 at 9:35 AM Kalle Happonen <kalle.happonen@xxxxxx> wrote: > >> > > >> > Hello all, > >> > wrt: > >> > https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7IMIWCKIHXNULEBHVUIXQQGYUDJAO2SF/ > >> > > >> > Yesterday we hit a problem with osd_pglog memory, similar to the thread above. > >> > > >> > We have a 56 node object storage (S3+SWIFT) cluster with 25 OSD disk per node. > >> > We run 8+3 EC for the data pool (metadata is on replicated nvme pool). > >> > > >> > The cluster has been running fine, and (as relevant to the post) the memory > >> > usage has been stable at 100 GB / node. We've had the default pg_log of 3000. > >> > The user traffic doesn't seem to have been exceptional lately. > >> > > >> > Last Thursday we updated the OSDs from 14.2.8 -> 14.2.13. On Friday the memory > >> > usage on OSD nodes started to grow. On each node it grew steadily about 30 > >> > GB/day, until the servers started OOM killing OSD processes. > >> > > >> > After a lot of debugging we found that the pg_logs were huge. Each OSD process > >> > pg_log had grown to ~22GB, which we naturally didn't have memory for, and then > >> > the cluster was in an unstable situation. This is significantly more than the > >> > 1,5 GB in the post above. We do have ~20k pgs, which may directly affect the > >> > size. > >> > > >> > We've reduced the pg_log to 500, and started offline trimming it where we can, > >> > and also just waited. The pg_log size dropped to ~1,2 GB on at least some > >> > nodes, but we're still recovering, and have a lot of ODSs down and out still. > >> > > >> > We're unsure if version 14.2.13 triggered this, or if the osd restarts triggered > >> > this (or something unrelated we don't see). > >> > > >> > This mail is mostly to figure out if there are good guesses why the pg_log size > >> > per OSD process exploded? Any technical (and moral) support is appreciated. > >> > Also, currently we're not sure if 14.2.13 triggered this, so this is also to > >> > put a data point out there for other debuggers. > >> > > >> > Cheers, > >> > Kalle Happonen > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx