Re: [RESEND PATCH] fs: avoid mmap sem relocks when coredumping with many missing pages

David Hildenbrand <david@xxxxxxxxxx> · Mon, 20 Jan 2025 19:59:52 +0100

On 19.01.25 11:32, Mateusz Guzik wrote:
Dumping processes with large allocated and mostly not-faulted areas is
very slow.

Borrowing a test case from Tavian Barnes:

int main(void) {
     char *mem = mmap(NULL, 1ULL << 40, PROT_READ | PROT_WRITE,
             MAP_ANONYMOUS | MAP_NORESERVE | MAP_PRIVATE, -1, 0);
     printf("%p %m\n", mem);
     if (mem != MAP_FAILED) {
             mem[0] = 1;
     }
     abort();
}

That's 1TB of almost completely not-populated area.

On my test box it takes 13-14 seconds to dump.

The profile shows:
-   99.89%     0.00%  a.out
      entry_SYSCALL_64_after_hwframe
      do_syscall_64
      syscall_exit_to_user_mode
      arch_do_signal_or_restart
    - get_signal
       - 99.89% do_coredump
          - 99.88% elf_core_dump
             - dump_user_range
                - 98.12% get_dump_page
                   - 64.19% __get_user_pages
                      - 40.92% gup_vma_lookup
                         - find_vma
                            - mt_find
                                 4.21% __rcu_read_lock
                                 1.33% __rcu_read_unlock
                      - 3.14% check_vma_flags
                           0.68% vma_is_secretmem
                        0.61% __cond_resched
                        0.60% vma_pgtable_walk_end
                        0.59% vma_pgtable_walk_begin
                        0.58% no_page_table
                   - 15.13% down_read_killable
                        0.69% __cond_resched
                     13.84% up_read
                  0.58% __cond_resched

Almost 29% of the time is spent relocking the mmap semaphore between
calls to get_dump_page() which find nothing.

Whacking that results in times of 10 seconds (down from 13-14).

While here make the thing killable.

The real problem is the page-sized iteration and the real fix would
patch it up instead. It is left as an exercise for the mm-familiar
reader.

Signed-off-by: Mateusz Guzik <mjguzik@xxxxxxxxx>
---

Minimally tested, very plausible I missed something.

sent again because the previous thing has myself in To -- i failed to
fix up the oneliner suggested by lore.kernel.org. it seem the original
got lost.

  arch/arm64/kernel/elfcore.c |  3 ++-
  fs/coredump.c               | 38 +++++++++++++++++++++++++++++++------
  include/linux/mm.h          |  2 +-
  mm/gup.c                    |  5 ++---
  4 files changed, 37 insertions(+), 11 deletions(-)

MM side LGTM

Acked-by: David Hildenbrand <david@xxxxxxxxxx>

--
Cheers,

David / dhildenb