> On Apr 15, 2022, at 10:08 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > > On Fri, Apr 15, 2022 at 12:05:42PM -0700, Luis Chamberlain wrote: >> Looks good except for that I think this should just wait for v5.19. The >> fixes are so large I can't see why this needs to be rushed in other than >> the first assumptions of the optimizations had some flaws addressed here. > > Patches 1 and 2 are bug fixes for regressions caused by using huge page > backed vmalloc by default. So I think we do need it for 5.18. The > other two do look like candidates for 5.19, though. Thanks Luis and Christoph for your kind inputs on the set. Here are my analysis after thinking about it overnight. We can discuss the users of vmalloc in 4 categories: module_alloc, BPF programs, alloc_large_system_hash, and others; and there are two archs involved here: x86_64 and powerpc. With whole set, the behavior is like: | x86_64 | powerpc --------------------------------------------+---------------------- module_alloc | use small pages --------------------------------------------+---------------------- BPF programs | use 2MB pages | use small changes --------------------------------------------+---------------------- large hash | use huge pages when size > PMD_SIZE --------------------------------------------+---------------------- other-vmalloc | use small pages Patch 1/4 fixes the behavior of module_alloc and other-vmalloc. Without 1/4, both these users may get huge pages for size > PMD_SIZE allocations, which may be troublesome([3] for example). Patch 3/4 and 4/4, together with 1/1, allows BPF programs use 2MB pages. This is the same behavior as before 5.18-rc1, which has been tested in bpf-next and linux-next. Therefore, I don't think we need to hold them until 5.19. Patch 2/4 enables huge pages for large hash. Large hash has been using huge pages on powerpc since 5.15. But this is new for x86_64. If we ship 2/4, this is a performance improvement for x86_64, but it is less tested on x86_64 (didn't go through linux-next). If we ship 1/4 but not 2/4 with 5.18, we will see a small performance regression for powerpc. Based on this analysis, I think we should either 1) ship the whole set with 5.18; or 2) ship 1/4, 3/4, and 4/4 with 5.18, and 2/4 with 5.19. With option 1), we enables huge pages for large hash on x86_64 without going through linux-next. With option 2), we take a small performance regression with 5.18 on powerpc. Of course, we can ship a hybrid solution by gating 2/4 for powerpc only in 5.18, and enabling it for x86_64 in 5.19. Does this make sense? Please let me know you comments and suggestions on this. Thanks, Song [3] https://lore.kernel.org/lkml/14444103-d51b-0fb3-ee63-c3f182f0b546@xxxxxxxxxxxxx/