On Thu, 2023-09-07 at 15:42 +0200, Vlastimil Babka wrote: > On 8/24/23 12:52, Jay Patel wrote: > > On Fri, 2023-08-11 at 17:43 +0200, Vlastimil Babka wrote: > > > On 8/10/23 19:54, Hyeonggon Yoo wrote: > > > > > order = calc_slab_order(size, > > > > > min_objects, > > > > > slub_max_order, > > > > > fraction); > > > > > @@ -4159,14 +4164,6 @@ static inline int > > > > > calculate_order(unsigned > > > > > int size) > > > > > min_objects--; > > > > > } > > > > > - /* > > > > > - * We were unable to place multiple objects in a > > > > > slab. > > > > > Now > > > > > - * lets see if we can place a single object there. > > > > > - */ > > > > > - order = calc_slab_order(size, 1, slub_max_order, 1); > > > > > - if (order <= slub_max_order) > > > > > - return order; > > > > > > > > I'm not sure if it's okay to remove this? > > > > It was fine in v2 because the least wasteful order was chosen > > > > regardless of fraction but that's not true anymore. > > > > > > > > Otherwise, everything looks fine to me. I'm too dumb to > > > > anticipate > > > > the outcome of increasing the slab order :P but this patch does > > > > not > > > > sound crazy to me. > > > > > > I wanted to have a better idea how the orders change so I hacked > > > up a > > > patch > > > to print them for all sizes up to 1MB (unnecessarily large I > > > guess) > > > and also > > > for various page sizes and nr_cpus (that's however rather > > > invasive > > > and prone > > > to me missing some helper being used that still relies on real > > > PAGE_SHIFT), > > > then I applied v4 (needed some conflict fixups with my hack) on > > > top: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=slab-orders > > > > > > As expected, things didn't change with 4k PAGE_SIZE. With 64k > > > PAGE_SIZE, I > > > thought the patch in v4 form would result in lower orders, but > > > seems > > > not always? > > > > > > I.e. I can see before the patch: > > > > > > Calculated slab orders for page_shift 16 nr_cpus 1: > > > 8 0 > > > 4376 1 > > > > > > (so until 4368 bytes it keeps order at 0) > > > > > > And after: > > > 8 0 > > > 2264 1 > > > 2272 0 > > > 2344 1 > > > 2352 0 > > > 2432 1 > > > > > > Not sure this kind of "oscillation" is helpful with a small > > > machine > > > (1CPU), > > > and 64kB pages so the unused part of page is quite small. > > > > > Hi Vlastimil, > > > > With patch. it will cause the fraction_size to rise to 32 > > when utilizing a 64k page size. As a result, the maximum wastage > > cap > > for each slab cache will be 2k (64k divided by 32). Any object size > > exceeding this cap will be moved to order 1 or beyond due to which > > this > > oscillation is seen. > > Hi, sorry for the late reply. > > > > With 16 cpus, AFAICS the orders are also larger for some sizes. > > > Hm but you reported reduction of total slab memory which suggests > > > lower > > > orders were selected somewhere, so maybe I did some mistake.A > > > > AFAIK total slab memory is reduce because of two reason (with this > > patch for larger page size) > > 1) order for some slab cache is reduce (by increasing > > fraction_size) > > How can increased fraction_size ever result in a lower order? I think > it can > only result in increased order (or same order). And the simulations > with my > hack patch don't seem to counter example that. Note previously I did > expect > the order to be lower (or same) and was surprised by my results, but > now I > realized I misunderstood the v4 patch. Hi, Sorry for late reply as i was on vacation :) You're absolutely right. Increasing the fraction size won't reduce the order, and I apologize for any confusion in my previous response. > > > 2) Have also seen reduction in overall slab cache numbers as > > because of > > increasing page order > > I think your results might be just due to randomness and could turn > out > different with repeating the test, or converge to be the same if you > average > multiple runs. You posted them for "160 CPUs with 64K Page size" and > if I > add that combination to my hack print, I see the same result before > and > after your patch: > > Calculated slab orders for page_shift 16 nr_cpus 160: > 8 0 > 1824 1 > 3648 2 > 7288 3 > 174768 2 > 196608 3 > 524296 4 > > Still, I might have a bug there. Can you confirm there are actual > differences with a /proc/slabinfo before/after your patch? If there > are > none, any differences observed have to be due to randomness, not > differences > in order. Indeed, to eliminate randomness, I've consistently gathered data from /proc/slabinfo, and I can confirm a decrease in the total number of slab caches. Values as on 160 cpu system with 64k page size Without patch 24892 slab caches with patch 23891 slab caches > > Going back to the idea behind your patch, I don't think it makes > sense to > try increase the fraction only for higher-orders. Yes, with 1/16 > fraction, > the waste with 64kB page can be 4kB, while with 1/32 it will be just > 2kB, > and with 4kB this is only 256 vs 128bytes. However the object sizes > and > counts don't differ with page size, so with 4kB pages we'll have more > slabs > to host the same number of objects, and the waste will accumulate > accordingly - i.e. the fraction metric should be independent of page > size > wrt resulting total kilobytes of waste. > > So maybe the only thing we need to do is to try setting it to 32 > initial > value instead of 16 regardless of page size. That should hopefully > again > show a good tradeoff for 4kB as one of the earlier versions, while on > 64kB > it shouldn't cause much difference (again, none at all with 160 cpus, > some > difference with less than 128 cpus, if my simulations were correct). > Yes, We can modify the default fraction size to 32 for all page sizes. I've noticed that on a 160 CPU system with a 64K page size, there's a noticeable change in the total memory allocated for slabs – it decreases. Alright, I'll make the necessary changes to the patch, setting the fraction size default to 32, and I'll post v5 along with some performance metrics. > > > > Anyway my point here is that this evaluation approach might be > > > useful, even > > > if it's a non-upstreamable hack, and some postprocessing of the > > > output is > > > needed for easier comparison of before/after, so feel free to try > > > that out. > > > > Thank you for this details test :) > > > BTW I'll be away for 2 weeks from now, so further feedback will > > > have > > > to come > > > from others in that time... > > > > > Do we have any additional feedback from others on the same matter? > > > > Thank > > > > Jay Patel > > > > Thanks! > > > > -- > > > > Hyeonggon