(2015/03/06 19:51), Petr Mladek wrote: > On Fri 2015-03-06 10:24:27, Masami Hiramatsu wrote: >> (2015/03/05 23:18), Josh Poimboeuf wrote: >>> On Thu, Mar 05, 2015 at 09:52:41AM +0900, Masami Hiramatsu wrote: >>>> (2015/03/04 22:17), Petr Mladek wrote: >>>>> On Tue 2015-03-03 17:02:22, Josh Poimboeuf wrote: >>>>>> It's possible for klp_register_patch() to see a module before the COMING >>>>>> notifier is called, or after the GOING notifier is called. >>>>>> >>>>>> That can cause all kinds of ugly races. As Pter Mladek reported: >>>>>> >>>>>> "The problem is that we do not keep the klp_mutex lock all the time when >>>>>> the module is being added or removed. >>>>>> >>>>>> First, the module is visible even before ftrace is ready. If we enable a patch >>>>>> in this time frame, adding ftrace ops will fail and the patch will get rejected >>>>>> just because bad timing. >>>>> >>>>> Ah, this is not true after all. I did not properly check when >>>>> MODULE_STATE_COMING was set. I though that it was before ftrace was >>>>> initialized but it was not true. >>>>> >>>>> >>>>>> Second, if we are "lucky" and enable the patch for the coming module when the >>>>>> ftrace is ready but before the module notifier has been called. The notifier >>>>>> will try to enable the patch as well. It will detect that it is already patched, >>>>>> return error, and the module will get rejected just because bad >>>>>> timing. The more serious problem is that it will not call the notifier for >>>>>> going module, so that the mess will stay there and we wont be able to load >>>>>> the module later. >>>>> >>>>> Ah, the race is there but the effect is not that serious in the >>>>> end. It seems that errors from module notifiers are ignored. In fact, >>>>> we do not propagate the error from klp_module_notify_coming(). It means >>>>> that WARN() from klp_enable_object() will be printed but the module >>>>> will be loaded and patched. >>>>> >>>>> I am sorry, I was confused by kGraft where kgr_module_init() was >>>>> called directly from module_load(). The errors were propagated. It >>>>> means that kGraft rejects module when the patch cannot be applied. >>>>> >>>>> Note that the current solution is perfectly fine for the simple >>>>> consistency model. >>>>> >>>>> >>>>>> Third, similar problems are there for going module. If a patch is enabled after >>>>>> the notifier finishes but before the module is removed from the list of modules, >>>>>> the new patch will be applied to the module. The module might disappear at >>>>>> anytime when the patch enabling is in progress, so there might be an access out >>>>>> of memory. Or the whole patch might be applied and some mess will be left, >>>>>> so it will not be possible to load/patch the module again." >>>>> >>>>> This is true. >>>> >>>> No, that's not true if you try_get_module() before patching. After the >>>> module state goes GOING (more correctly say, after try_release_module_ref() >>>> succeeded), all try_get_module() must fail :) >>>> So, please make sure to get module when applying patches. >>> >>> Hi Masami, >>> >>> As Jikos pointed out elsewhere, try_get_module() won't solve all the >>> GOING races. >>> >>> The module can be in GOING before mod->exit() is called. If we apply a >>> patch between GOING getting set and mod->exit(), try_module_get() will >>> fail and the module won't be patched. But module code can still run >>> before or during mod->exit(), so the unpatched module code might >>> interact badly with new patched code elsewhere. >> >> Hmm, in that case, we'd better have new GONE state for the module. >> At least kprobe needs it. > > What is the exact problem with kprobes, please? Ah, sorry, I miss understood the issue. > Note that the notifiers for MODULE_STATE_GOING are called > after mod->exit(). Therefore it is safe to kill kprobes > the fast way when using the notifier. Right. /* Stop the machine so refcounts can't move and disable module. */ ret = try_stop_module(mod, flags, &forced); if (ret != 0) goto out; mutex_unlock(&module_mutex); /* Final destruction now no one is using it. */ if (mod->exit != NULL) mod->exit(); blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_GOING, mod); async_synchronize_full(); And, kprobes also can try to put a probe between mutex_unlock() and mod->exit() on mod->exit() function. Since kprobe tries to get module when registering, it will fail but mod->exit() is called after that fail. So, there is also a race. However, I'm not sure that is actual serious issue. Actually, we can suppose this module unloading context is not changing universe. thus it is expected behavior, isn't it? Thank you, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Research Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu.pt@xxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html