+++ Joe Lawrence [03/08/20 14:17 -0400]:
On 8/3/20 1:45 PM, Kees Cook wrote:
On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote:
There are at least 2 places where high-order memory allocations might happen
during module loading. Such allocations may fail if memory is fragmented,
while physically contiguous memory areas are not really needed there. I
suggest to switch to kvmalloc/kvfree there.
Thanks Evgenii for pointing out the potential memory allocation issues
that may arise with very large modules when memory is fragmented. I was curious
as to which modules on my machine would be considered large, and there seems to be
quite a handful...(x86_64 with v5.8-rc6 with a relatively standard distro config and
FG KASLR patches on top):
./amdgpu/sections 7277
./i915/sections 4267
./nouveau/sections 3772
./xfs/sections 2395
./btrfs/sections 1966
./mac80211/sections 1588
./kvm/sections 1468
./cfg80211/sections 1194
./drm/sections 1012
./bluetooth/sections 843
./iwlmvm/sections 664
./usbcore/sections 524
./videodev/sections 436
So, I agree with the suggestion that we could switch to kvmalloc() to
try to mitigate potential allocation problems when memory is fragmented.
While this does seem to be the right solution for the extant problem, I
do want to take a moment and ask if the function sections need to be
exposed at all? What tools use this information, and do they just want
to see the bounds of the code region? (i.e. the start/end of all the
.text* sections) Perhaps .text.* could be excluded from the sysfs
section list?
[[cc += FChE, see [0] for Evgenii's full mail ]]
It looks like debugging tools like systemtap [1], gdb [2] and its
add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info.
But yeah, it would be preferable if we didn't export a long sysfs
representation if nobody actually needs it.
Thanks Joe for looking into this. Hmm, AFAICT for gdb it's not a hard
dependency per se - for add-symbol-file I was under the impression
that we are responsible for obtaining the relevant section addresses
ourselves through /sys/module/ (the most oft cited method) and then
feeding those to add-symbol-file. It would definitely be more
difficult to find out the section addresses without the /sys/module/
section entries. In any case, it sounds like systemtap has a hard
dependency on /sys/module/*/sections anyway.
Regarding /proc/kallsyms, I think it is probably possible to expose
section symbols and their addresses via /proc/kallsyms rather than
through sysfs (it would then live in the module's vmalloc'ed memory)
but I'm not sure how helpful that would actually be, especially since
existing tools depend on the sysfs interface being there.
[0] https://lore.kernel.org/lkml/e9c4d88b-86db-47e9-4299-3fac45a7e3fd@xxxxxxxxxxxxx/
[1] https://fossies.org/linux/systemtap/staprun/staprun.c
[2] https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch04.html#linuxdrive3-CHP-4-SECT-6.1
-- Joe