Igor, thanks, very helpful. Our current osdmap weighs 1.4MB. And it changes all calculations.. Looks like it can be our case. I think we have this situation due to long backfilling which takes place now and going for the last 3 weeks. Can we drop some amount of osdmaps before rebalance completes? чт, 19 сент. 2024 г. в 15:38, Igor Fedotov <igor.fedotov@xxxxxxxx>: > please see my comments inline. > > > On 9/19/2024 1:53 PM, Александр Руденко wrote: > > Igor, thanks! > > > What are the numbers today? > > Today we have the same "oldest_map": 2408326 and "newest_map": 2637838, > *+2191*. > > ceph-objectstore-tool --op meta-list --data-path /var/lib/ceph/osd/ceph-70 > | grep osdmap | wc -l > 458994 > > Can you clarify this, please: > > > and then multiply by amount of OSDs to learn the minimal space taken by > this data > > 458994 * 4k * OSDs count = "*size of osdmaps on ONE OSD*" or "*total size > of osdmaps on ALL OSDs*" ? > > Yes, this is a lower bound estimation for osdmap size on all OSDs. > > > > Because we have about 3k OSDS and 458994 * 4k * 3000 = ~5TB and it can be > placed on ONE OSD. > But if it is TOTAL osdmap size, I think it is a very small size per OSD. > > Highly likely that osdmap for 3K OSDs takes much more than 4K on disk. So > again that was just lower bound estimation. > > In fact one can use 'ceph osd getmap >out.dat' and get better estimation > of osdmap size. So please substitute 4K in the formala above to get better > estimation for the overall space taken. > > It's a bit simplified though since just half of the entries in 'meta' pool > are full osdmaps. Hence you might want to use 458994/2 * sizeof(osdmap) + > 458994/2 * 4K in the above formula. > > Which is again a sort of low bound estimation but with a better accuracy. > > > > But we have a lot of osds with min_alloc_size=64k which was default in > previous ceph's versions for rotational drives (all our SSDs behind old > RAID controllers). > > ceph daemon osd.10 bluestore allocator dump block | head -10 > { > "capacity": 479557844992, > "alloc_unit": 65536, > > But even with min_alloc=64k it will not be a big amount of data 458994 * > 64k = *~23GB*. I think we have about *150GB+* extra per SSD OSDs. > > Yeah, you should use 64K instead of 4K for the above formula if you have > the majority of OSDs using 64K alloc unit. Or take this into account > somehow else (e.g. take half 4K and half 64K). But I'm leaving this as a > "home excercise" for yourself. The main point here is that a single object > would take at least alloc_unit size. And hence I was trying to make the > assessment without knowing actual osdmap size but using alloc unit one. > Just to check ifwe get numbers of the same order of magnitude. And 23GB and > 150GB aren't THAT differ - having e.g. 1M osdmap might easily do the trick. > I.e. the osdmap leak indeed could be a real factor here. And hence it's > worth additional investigation. > > Anyway - please use the obtained osdmap size. It could adjust the > resulting estimation value dramatically. > > > For example, SSD with min_alloc=4k: > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS > 126 ssd 0.00005 1.00000 447 GiB 374 GiB 300 GiB 72 GiB 1.4 GiB > 73 GiB 83.64 1.00 137 up > > with min_alloc=64k: > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL %USE VAR PGS STATUS > 10 ssd 0.00005 0.75000 447 GiB 405 GiB 320 GiB 83 GiB 1.4 GiB > 42 GiB 90.59 1.00 114 up > > Diff is not as big as 4k vs 64k.. > > Right. Don't know the reason atm. May be leaking osdmaps is not the only > iissue. Please do the corrected math as per above though.. > > > > чт, 19 сент. 2024 г. в 12:33, Igor Fedotov <igor.fedotov@xxxxxxxx>: > >> Hi Konstantin, >> >> osd_target_transaction_size should control that. >> >> I've heard of it being raized to 150 with no obvious issues. Going beyond >> is at your own risk. So I'd suggest to apply incremental increase if needed. >> >> >> Thanks, >> >> Igor >> On 9/19/2024 10:44 AM, Konstantin Shalygin wrote: >> >> Hi Igor, >> >> On 18 Sep 2024, at 18:22, Igor Fedotov <igor.fedotov@xxxxxxxx> >> <igor.fedotov@xxxxxxxx> wrote: >> >> I recall a couple of cases when permanent osdmap epoch growth has been >> filling OSD with relevant osd map info. Which could be tricky to catch. >> >> Please run 'ceph tell osd.N status" for a couple of affected OSDs twice >> within e.g. 10 min interval. >> >> Then check the delta between oldest_map and newest_map fields - neither >> the delta should be very large (hundreds of thousands) nor it should grow >> rapidly within the observed interval. >> >> >> Side question by topic. What is option controls how much maps to prune? >> Currently I need to trim 1M osdmaps, but when new map issued, only 30 old >> maps are removed. What option controls value=30? >> >> >> Thanks, >> k >> >> -- >> Igor Fedotov >> Ceph Lead Developer >> >> Looking for help with your Ceph cluster? Contact us at https://croit.io >> >> croit GmbH, Freseniusstr. 31h, 81247 Munich >> CEO: Martin Verges - VAT-ID: DE310638492 >> Com. register: Amtsgericht Munich HRB 231263 >> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx >> >> -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx