Hi Miroslav, Thanks for your kind reply. > On Jul 21, 2022, at 5:11 AM, Miroslav Benes <mbenes@xxxxxxx> wrote: > > Hi, > > On Wed, 20 Jul 2022, Song Liu wrote: > >> Hi folks, >> >> While testing livepatch kernel modules, we found that if a kernel module has >> patched functions, we cannot unload and load it again (rmmod, then insmod). >> This hasn't happened in production yet, but it feels very risky. We use >> automation (chef to be specific) to handle kernel modules and livepatch. >> It is totally possible some weird sequence of operations would cause insmod >> errors on thousands of servers. Therefore, we would like to fix this issue >> before it hits us hard. >> >> A bit of searching with the error message shows it was a known issue [1], and >> a few options are discussed: >> >> "Possible ways to fix it: >> >> 1) Remove the error check in apply_relocate_add(). I don't think we >> should do this, because the error is actually useful for detecting >> corrupt modules. And also, powerpc has the similar error so this >> wouldn't be a universal solution. >> >> 2) In klp_unpatch_object(), call an arch-specific arch_unpatch_object() >> which reverses any arch-specific patching: on x86, clearing all >> relocation targets to zero; on powerpc, converting the instructions >> after relative link branches to nops. I don't think we should do >> this because it's not a global solution and requires fidgety >> arch-specific patching code. >> >> 3) Don't allow patched modules to be removed. I think this makes the >> most sense. Nobody needs this functionality anyway (right?). >> " >> >> I personally think 2) is a better approach, as it doesn't introduce any >> new limitations. (I admit that I only plan to implement the fix for x86). >> 3) seems easier to implement, we just need the livepatch to hold a >> reference to the module, right? But Miroslav mentioned there are >> some issues with 3), which I can't figure out. > > If I understand my notes correctly, 3) as implemented would not solve the > following scenario... > > 1. a live patch is loaded, it contains a fix for a module M which is not > loaded > 2. module M is loaded, relocations and things are applied for the live > patch, load_module() on M fails later and it is not loaded in the end. > 3. another load attempt of M would not succeed. If we just hold a reference of module M, this would not matter as module M is freed on load_module failures? Did I miss anything? > > I am not sure how real the scenario is, but we also deemed the original > problem as improbable in practice. > >> So, what would be the right approach to fix this issue? Is anybody >> working on this already? If we can agree the right approach, I am >> more than happy to implement it (well, x86 only for option 2). > > No one works on that as far as I know. I abandoned the patch set for 3) > due to the reason above, and then even the patch set for 2) because it > was not nice to put it mildly. We decided the problem did not exist in > practice (till someone says otherwise). > > The thread mentions better late module loading infrastructure. If I > remember correctly, it did not happen in the end. peterz removed all > issues with the late module loading we had because we had misunderstood a > couple of things. > > Now 3) was submitted as > https://lore.kernel.org/all/20180607092949.1706-1-mbenes@xxxxxxx/T/#u > > 2) as > v1 https://lore.kernel.org/all/20190719122840.15353-1-mbenes@xxxxxxx/T/#u > v2 https://lore.kernel.org/all/20190905124514.8944-1-mbenes@xxxxxxx/T/#u I guess we can just ship v2 (maybe with some minor updates). I will give it a try. > > I also pushed all I have to > https://git.kernel.org/pub/scm/linux/kernel/git/mbenes/linux.git/. > Branches klp_deny_rmmod* and klp_clear_reloc*. Actually, both versions look good to me (I haven't done any tests yet). Do we have any preference on which way to go (given something changed over the past 3 years). Thanks, Song