On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote: > On 14/03/18 15:04, Matthew Wilcox wrote: > > but the principle it uses > > seems like a better match to me than the rather complex genalloc. > > It uses meta data in a different way than genalloc. > There is probably a tipping point where one implementation becomes more > space-efficient than the other. Certainly there are always tradeoffs in writing a memory allocator. > Probably page_frag does well with relatively large allocations, while > genalloc seems to be better for small (few allocation units) allocations. I don't understand why you would think that. If you allocate 4096 1-byte elements, page_frag will just use up a page. Doing the same thing with genalloc requires allocating two bits per byte (1kB of bitmap), plus other overheads. > Also, in case of high variance in the size of the allocations, genalloc > requires the allocation unit to be small enough to fit the smallest > request (otherwise one must accept some slack), while page_frag doesn't > care if the allocation is small or large. Right; internal versus external fragmentation. The bane of memory allocators ;-) > page_frag otoh, seems to not support the reuse of space that was freed, > since there is only To a certain extent it does. If you free everything on a page, and that page is still in the page_frag_cache, it will get reused. > But could you please explain to what you are referring to, when you say > that page_frag has "significantly lower overhead" ? Less CPU time taken per allocation, less metadata stored per object. > Ex: if the pfree is called only on error paths, is it ok to not claim > back the memory released, if it's less than one page? Yes, I think that's a great example. > To be clear: I do not want to hold to genalloc just because I have > already implemented it. I can at least sketch a version with page_frag, > but I would like to understand why its trade-offs are better :-) > > > Just allocate some pages and track the offset within those pages that > > > is the current allocation point. > > > > It's less than 100 lines of code! > > Strictly speaking it is true, but it all relies on other functions, > which must be rewritten, because they use linear address, while this > must work with virtual (vmalloc) addresses. No, that's basically the whole thing. I think an implementation of pmalloc which used a page_frag-style allocator would be larger than 100 lines, but I don't think it would have to be significantly larger than that. > Also, I see that the code relies a lot on order of allocation. > I think we had similar discussion wrt compound pages. > > It seems to me wasteful, if I have a request of, say, 5 pages, and I end > up allocating 8. Yes, but the other three pages are available for use by the pmalloc pool. Now, at pmalloc_protect() time, you might well want to release the unused pages by calling make_alloc_exact() and hand those three pages back to the page allocator. > I do not recall anyone giving a justification like: > "yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z, > so it's a good trade-off" Sometimes it is, sometimes it isn't. > Could it be that it's normal RAM is considered less precious than the > special memory genalloc is written for, so normal RAM is not really > proactively reused, while special memory is treated as a more valuable > resource that should not be wasted? We're certainly at the point where normal RAM is pretty cheap. A 16GB DIMM is $200, so that's $12.50 per gigabyte. We have more of a problem with fragmentation than we do with squeezing every last byte out of the system. Of course, Linux still runs on tiny systems, and we don't want to unnecessarily bloat the kernel. And cachelines are also a precious resource; the fewer we touch, the faster the system runs. The bitmap in genalloc can easily occupy several cachelines; the page_frag allocator touches a single cacheline for most allocations.