Hi, On Wed, 20 Jul 2022, Song Liu wrote: > Hi folks, > > While testing livepatch kernel modules, we found that if a kernel module has > patched functions, we cannot unload and load it again (rmmod, then insmod). > This hasn't happened in production yet, but it feels very risky. We use > automation (chef to be specific) to handle kernel modules and livepatch. > It is totally possible some weird sequence of operations would cause insmod > errors on thousands of servers. Therefore, we would like to fix this issue > before it hits us hard. > > A bit of searching with the error message shows it was a known issue [1], and > a few options are discussed: > > "Possible ways to fix it: > > 1) Remove the error check in apply_relocate_add(). I don't think we > should do this, because the error is actually useful for detecting > corrupt modules. And also, powerpc has the similar error so this > wouldn't be a universal solution. > > 2) In klp_unpatch_object(), call an arch-specific arch_unpatch_object() > which reverses any arch-specific patching: on x86, clearing all > relocation targets to zero; on powerpc, converting the instructions > after relative link branches to nops. I don't think we should do > this because it's not a global solution and requires fidgety > arch-specific patching code. > > 3) Don't allow patched modules to be removed. I think this makes the > most sense. Nobody needs this functionality anyway (right?). > " > > I personally think 2) is a better approach, as it doesn't introduce any > new limitations. (I admit that I only plan to implement the fix for x86). > 3) seems easier to implement, we just need the livepatch to hold a > reference to the module, right? But Miroslav mentioned there are > some issues with 3), which I can't figure out. If I understand my notes correctly, 3) as implemented would not solve the following scenario... 1. a live patch is loaded, it contains a fix for a module M which is not loaded 2. module M is loaded, relocations and things are applied for the live patch, load_module() on M fails later and it is not loaded in the end. 3. another load attempt of M would not succeed. I am not sure how real the scenario is, but we also deemed the original problem as improbable in practice. > So, what would be the right approach to fix this issue? Is anybody > working on this already? If we can agree the right approach, I am > more than happy to implement it (well, x86 only for option 2). No one works on that as far as I know. I abandoned the patch set for 3) due to the reason above, and then even the patch set for 2) because it was not nice to put it mildly. We decided the problem did not exist in practice (till someone says otherwise). The thread mentions better late module loading infrastructure. If I remember correctly, it did not happen in the end. peterz removed all issues with the late module loading we had because we had misunderstood a couple of things. Now 3) was submitted as https://lore.kernel.org/all/20180607092949.1706-1-mbenes@xxxxxxx/T/#u 2) as v1 https://lore.kernel.org/all/20190719122840.15353-1-mbenes@xxxxxxx/T/#u v2 https://lore.kernel.org/all/20190905124514.8944-1-mbenes@xxxxxxx/T/#u I also pushed all I have to https://git.kernel.org/pub/scm/linux/kernel/git/mbenes/linux.git/. Branches klp_deny_rmmod* and klp_clear_reloc*. Feel free to take over and work on that. I am afraid I would not get to it anytime soon unfortunately. Regards Miroslav