OK. It is related to a module vmap space allocation when a module is
inserted. I wounder why it requires 2.5MB for a module? It seems a lot
to me.
Indeed. I assume KASAN can go wild when it instruments each and every
memory access.
Really looks like only module vmap space. ~ 1 GiB of vmap module space ...
If an allocation request for a module is 2.5MB we can load ~400 modules
having 1GB address space.
"lsmod | wc -l"? How many modules your system has?
~71, so not even close to 400.
What I find interesting is that we have these recurring allocations of similar sizes failing.
I wonder if user space is capable of loading the same kernel module concurrently to
trigger a massive amount of allocations, and module loading code only figures out
later that it has already been loaded and backs off.
If there is a request about allocating memory it has to be succeeded
unless there are some errors like no space no memory.
Yes. But as I found out we're really out of space because module loading
code allocates module VMAP space first, before verifying if the module
was already loaded or is concurrently getting loaded.
See below.
[...]
I wrote a small patch to dump a modules address space when a fail occurs:
<snip v6.0>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 83b54beb12fa..88d323310df5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
kmem_cache_free(vmap_area_cachep, va);
}
+static void
+dump_modules_free_space(unsigned long vstart, unsigned long vend)
+{
+ unsigned long va_start, va_end;
+ unsigned int total = 0;
+ struct vmap_area *va;
+
+ if (vend != MODULES_END)
+ return;
+
+ trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend);
+
+ spin_lock(&free_vmap_area_lock);
+ list_for_each_entry(va, &free_vmap_area_list, list) {
+ va_start = (va->va_start > vstart) ? va->va_start:vstart;
+ va_end = (va->va_end < vend) ? va->va_end:vend;
+
+ if (va_start >= va_end)
+ continue;
+
+ if (va_start >= vstart && va_end <= vend) {
+ trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n",
+ va_start, va_end, va_end - va_start);
+ total += (va_end - va_start);
+ }
+ }
+
+ spin_unlock(&free_vmap_area_lock);
+ trace_printk("--- Total free: %u ---\n", total);
+}
+
/*
* Allocate a region of KVA of the specified size and alignment, within the
* vstart and vend.
@@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
goto retry;
}
- if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
+ if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
pr_warn("vmap allocation for size %lu failed: use vmalloc=<size> to increase size\n",
size);
+ dump_modules_free_space();
+ }
+
kmem_cache_free(vmap_area_cachep, va);
return ERR_PTR(-EBUSY);
}
Thanks!
I can spot the same module getting loaded over and over again
concurrently from user space, only failing after all the allocations
when realizing that the module is in fact already loaded in
add_unformed_module(), failing with -EEXIST.
That looks quite inefficient. Here is how often user space tries to load
the same module on that system. Note that I print *after* allocating
module VMAP space.
# dmesg | grep Loading | cut -d" " -f5 | sort | uniq -c
896 acpi_cpufreq
1 acpi_pad
1 acpi_power_meter
2 ahci
1 cdrom
2 compiled-in
1 coretemp
15 crc32c_intel
307 crc32_pclmul
1 crc64
1 crc64_rocksoft
1 crc64_rocksoft_generic
12 crct10dif_pclmul
16 dca
1 dm_log
1 dm_mirror
1 dm_mod
1 dm_region_hash
1 drm
1 drm_kms_helper
1 drm_shmem_helper
1 fat
1 fb_sys_fops
14 fjes
1 fuse
205 ghash_clmulni_intel
1 i2c_algo_bit
1 i2c_i801
1 i2c_smbus
4 i40e
4 ib_core
1 ib_uverbs
4 ice
403 intel_cstate
1 intel_pch_thermal
1 intel_powerclamp
1 intel_rapl_common
1 intel_rapl_msr
399 intel_uncore
1 intel_uncore_frequency
1 intel_uncore_frequency_common
64 ioatdma
1 ipmi_devintf
1 ipmi_msghandler
1 ipmi_si
1 ipmi_ssif
4 irdma
406 irqbypass
1 isst_if_common
165 isst_if_mbox_msr
300 kvm
408 kvm_intel
1 libahci
2 libata
1 libcrc32c
409 libnvdimm
8 Loading
1 lpc_ich
1 megaraid_sas
1 mei
1 mei_me
1 mgag200
1 nfit
1 pcspkr
1 qrtr
405 rapl
1 rfkill
1 sd_mod
2 sg
409 skx_edac
1 sr_mod
1 syscopyarea
1 sysfillrect
1 sysimgblt
1 t10_pi
1 uas
1 usb_storage
1 vfat
1 wmi
1 x86_pkg_temp_thermal
1 xfs
For each if these loading request, we'll reserve module VMAP space, and
free it once we realize later that the module was already previously loaded.
So with a lot of CPUs we might end up trying to load the same module
that often at the same time that we actually run out of module VMAP space.
I have a prototype patch that seems to fix this in module loading code.
Thanks!
--
Thanks,
David / dhildenb