On Tue, Apr 07, 2020 at 09:33:08AM +0200, Miroslav Benes wrote: > On Mon, 6 Apr 2020, Joe Lawrence wrote: > > > On Fri, Jan 17, 2020 at 04:03:19PM +0100, Petr Mladek wrote: > > > HINT: Get some coffee before reading this commit message. > > > > > > [ ... snip ... ] > > > > > > B. Livepatch module using an exported symbol from the patched module. > > > > > > It should be avoided even with the non-split livepatch module. The module > > > loader automatically takes reference to make sure the modules are > > > unloaded in the right order. This would basically prevent the livepatched > > > module from unloading. > > > > > > Note that it would be perfectly safe to remove this automatic > > > dependency. The livepatch framework makes sure that the livepatch > > > module is loaded only when the patched one is loaded. But it cannot > > > be implemented easily, see below. > > > > Do you envision klp-convert providing this functionality? > > > > [ ... snip ... ] > > That is one way to get around the dependency problem. And I think it > should work even with the PoC. It should (and I don't remember all details > now unfortunately) guarantee that the patched module is always available > for the livepatch and since there is no explicit dependency, the recursion > issue is gone. > > However, I think the goal was to follow the most natural road and leverage > the existing dependency system. Meaning, since the presence of a patched > module in the system before its patching is guaranteed now, there is no > reason not to use its exported symbols directly like anywhere else. But it > introduces the recursion problem, so we may drop it. > > > FWIW, I have been working an updated klp-convert for v5.6 that includes > > a bunch of fixes and such... modifying it to convert module-export > > references like this was quite easy. > > *THUMBS UP* :) Hi Miroslav, Perhaps the following bug report is premature, but it was far easier to start playing with the PoC code than audit all the individual race conditions :) This came out of the mentioned klp-convert rebase that I later put on-top of Petr's PoC, and then started writing up some more selftests to see how the new livepatching world might look. Forgive me if I'm violating some obvious assumption that the PoC makes, but I think the experiment may still be useful in pointing out a problematic use-case / potential pitfall a livepatch author might encounter. tl;dr: prepare_coming_module() calls jump_label_add_module() *after* it calls klp_module_coming(). In the PoC, the latter function now searches for any livepatches that may apply to the loading module and tries to load them before proceeding. If one of those livepatch modules references any static key defined by the originally loading module, the static key entry structures may not be setup correctly. The test case: - A kernel module defines a static key called test_klp_true_key and on __init, calls static_branch_disable(). I don't think the __init part is required for this case, however it was just the easiest way to write the test that I was working on at the time. AFAIK we could change the key any where else with the same problem. - A livepatch module that references test_klp_true_key. The overarching test case was for the key's module owner (target) and livepatch (livepatch__target) to toggle the key and for both target and livepatch__target modules to be patched accordingly. - klp-convert was enhanced to modify relocations to test_klp_true_key in both .text and __jump_table sections. It previously could not handle this scenario. Pseudocode ========== target.c -------- static DEFINE_STATIC_KEY_TRUE(test_klp_true_key); ... __init function() { static_branch_disable(&test_klp_true_key); } livepatch__target.c ------------------- extern struct static_key_true test_klp_true_key; ... pr_info("static_branch_likely(&test_klp_true_key) is %s\n", static_branch_likely(&test_klp_true_key) ? "true" : "false"); Test ==== % modprobe livepatch # livepatch__target only loads when target loads % modprobe target (crash in jump_label_update) Callchain and notes =================== (livepatch is already loaded) target ------ load_module apply_relocations post_relocations module_finalize jump_label_apply_nops complete_formation mod->state = MODULE_STATE_COMING prepare_coming_module klp_module_coming klp_try_load_object patch_name = livepatch, obj_name = target livepatch__target ----------------- load_module apply_relocations post_relocations module_finalize jump_label_apply_nops complete_formation mod->state = MODULE_STATE_COMING prepare_coming_module blocking_notifier_call_chain(MODULE_STATE_COMING) jump_label_module_notify (MODULE_STATE_COMING) jump_label_add_module first time processing test_klp_true_key, within_module() returns false (the key is defined in the other module), and we treat it as !static_key_linked() so the code goes ahead and makes it linked static_key_set_linked key->type |= JUMP_TYPE_LINKED do_init_module mod->state = MODULE_STATE_LIVE; blocking_notifier_call_chain(MODULE_STATE_LIVE) jump_label_module_notify (MODULE_STATE_LIVE) target ------ (prepare_coming_module cont...) blocking_notifier_call_chain(MODULE_STATE_COMING) jump_label_module_notify(MODULE_STATE_COMING) jump_label_add_module second time processing test_klp_true_key, within_module() returns true this time and we go straight to static_key_set_entries(), which is careful enough to verify the that the entries aren't already linked, but not the key itself static_key_set_entries WARN_ON_ONCE((unsigned long)entries & JUMP_TYPE_MASK) do_init_module mod->state = MODULE_STATE_LIVE; blocking_notifier_call_chain(MODULE_STATE_LIVE) jump_label_module_notify (MODULE_STATE_LIVE) Ok, so it seems that jump_label_add_module() assumes that any given key that isn't present in said module, is assumed to have already been initialized and therefore it enters that convoluted JUMP_TYPE_LINKED code. When we later try call jump_label_add_module() for the target module, the same function assumes that the key is not linked and we're left with a weird static_key_mod entry that later crashes the box. tl;dr2: Could we safely reorder klp_module_{coming,going}() with respect to the module_notifier callbacks? i.e. blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_COMING, mod); klp_module_coming(mod); ... and ... klp_module_going(mod); blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_GOING, mod); This fixes the above test case, but I wasn't sure if it now violates some other part of the PoC or consistency model. -- Joe