Re: [PATCH 1/4] mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdist

Nicholas Piggin <npiggin@xxxxxxxxx> · Mon, 03 Jun 2019 12:22:12 +1000

Linus Torvalds's on June 1, 2019 4:30 am:
> On Tue, May 28, 2019 at 5:08 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote:
>>
>> The kernel currently clamps large system hashes to MAX_ORDER when
>> hashdist is not set, which is rather arbitrary.
> 
> I think the *really* arbitrary part here is "hashdist".
> 
> If you enable NUMA support, hashdist is just set to 1 by default on
> 64-bit, whether the machine actually has any numa characteristics or
> not. So you take that vmalloc() TLB overhead whether you need it or
> not.

Yeah, that's strange it seems to just be an oversight nobody ever
picked up. Patch 2/4 actually fixed that exactly the way you said.

> 
> So I think your series looks sane, and should help the vmalloc case
> for big hash allocations, but I also think that this whole
> alloc_large_system_hash() function should be smarter in general.
> 
> Yes, it's called "alloc_large_system_hash()", but it's used on small
> and perfectly normal-sized systems too, and often for not all that big
> hashes.
> 
> Yes, we tend to try to make some of those hashes large (dentry one in
> particular), but we also use this for small stuff.
> 
> For example, on my machine I have several network hashes that have
> order 6-8 sizes, none of which really make any sense to use vmalloc
> space for (and which are smaller than a large page, so your patch
> series wouldn't help).
> 
> So on the whole I have no issues with this series, but I do think we
> should maybe fix that crazy "if (hashdist)" case. Hmm?

Yes agreed. Even after this series with 2MB mappings it's actually a bit 
sad that we can't use the linear map for the non-NUMA case. My laptop 
has a 32MB dentry cache and 16MB inode cache so doing a bunch of name 
lookups is quite a waste of TLB entries (although at least with 2MB 
pages it doesn't blow the TLB completely).

We might be able to go a step further and use memblock allocator for
those as well, or reserve some boot CMA for that common case ot just
use the linear map for these hashes. I'll look into that.

Thanks,
Nick