On 04/06/22 at 01:16pm, Omar Sandoval wrote: > On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote: > > On 04/06/22 at 11:13am, Uladzislau Rezki wrote: > > > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote: > > > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set > > > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I > > > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr. > > > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap > > > > > areas itself. > > > > > > > > This fixes the bug and the interface also is better than what we had > > > > before. But a vmap/iounmap_eager would seem even better. But hey, > > > > right now it has one caller in always built іn x86 arch code, so maybe > > > > it isn't worth spending more effort on this. > > > > > > > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was > > > added in 2010 year: > > > > > > <snip> > > > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d > > > Author: Cliff Wickman <cpw@xxxxxxx> > > > Date: Thu Sep 16 11:44:02 2010 -0500 > > > > > > mm, x86: Saving vmcore with non-lazy freeing of vmas > > > > > > During the reading of /proc/vmcore the kernel is doing > > > ioremap()/iounmap() repeatedly. And the buildup of un-flushed > > > vm_area_struct's is causing a great deal of overhead. (rb_next() > > > is chewing up most of that time). > > > > > > This solution is to provide function set_iounmap_nonlazy(). It > > > causes a subsequent call to iounmap() to immediately purge the > > > vma area (with try_purge_vmap_area_lazy()). > > > > > > With this patch we have seen the time for writing a 250MB > > > compressed dump drop from 71 seconds to 44 seconds. > > > > > > Signed-off-by: Cliff Wickman <cpw@xxxxxxx> > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > Cc: kexec@xxxxxxxxxxxxxxxxxxx > > > Cc: <stable@xxxxxxxxxx> > > > LKML-Reference: <E1OwHZ4-0005WK-Tw@xxxxxxxxxxxxxxxxxxxxxx> > > > Signed-off-by: Ingo Molnar <mingo@xxxxxxx> > > > <snip> > > > > > > and the reason was the "slow vmap" code, i.e. due to poor performance > > > they decided to drop the lazily ASAP. Now we have absolutely different > > > picture when it comes to performance and the vmalloc/vmap code. > > > > I would vote for the current code change, removing it. As pointed out by > > Christoph, it's only used by x86, may not be so worth to introduce a new > > interface. > > I did a quick benchmark to see if this optimization is still needed. > This is on a system with 32GB RAM. I timed > `dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17, > 5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely. > It looks like Uladzislau has a point, and this "optimization" actually > slows things down now: > > |5.17 |5.18+fix|5.18+removal > 4k|40.86s| 40.09s| 26.73s > 1M|24.47s| 23.98s| 21.84s > > I'll send a v2 which removes set_iounmap_nonlazy() entirely. Hi Omar, Thanks for the effort on posting patch to fix this and further benchmark testing. The removing I said means what you are doing in v1. While from your testing result, seems removing set_iounmap_nonlazy() directly is better. I agree that the old optimization was made too long ago, should be not needed any more. E.g the added purge_vmap_area_root tree will speed up the searching, adding and removing of the purged vmap_area. I am wondering if this is a real issue you met, or you just found it by code inspecting. If it breaks vmcore dumping, we should add 'Fixes' tag and cc stable. I am wondering how your vmcore dumping is handled. Asking this because we usually use makedumpfile utility to filter and dump those kernel data, the unneeded data, e.g zero-ed pages, unused pages, are all filtered out. While using makedumpfile, we use mmap which is 4M at one time by default, then process the content. So the copy_oldmem_page() may only be called during elfcorehdr and notes reading. Thanks Baoquan