> On Jul 6, 2022, at 6:18 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Thu, 7 Jul 2022 00:19:07 +0000 > Song Liu <songliubraving@xxxxxx> wrote: > >>>> In this specific race condition, register_bpf() will succeed, as it already >>>> got tr->mutex. But the IPMODIFY (livepatch) side will fail and retry. >>> >>> What else takes the tr->mutex ? >> >> tr->mutex is the local mutex for a single BPF trampoline, we only need to take >> it when we make changes to the trampoline (add/remove fentry/fexit programs). >> >>> >>> If it preempts anything else taking that mutex, when this runs, then it >>> needs to be careful. >>> >>> You said this can happen when the live patch came first. This isn't racing >>> against live patch, it's racing against anything that takes the tr->mutex >>> and then adds a bpf trampoline to a location that has a live patch. >> >> There are a few scenarios here: >> 1. Live patch is already applied, then a BPF trampoline is being registered >> to the same function. In bpf_trampoline_update(), register_fentry returns >> -EAGAIN, and this will be resolved. > > Where will it be resolved? bpf_trampoline_update() will set tr->indirect_call and goto again. Then the trampoline is re-prepared to be able to share with the IPMODIFY functions and register_fentry will succeed. > >> >> 2. BPF trampoline is already registered, then a live patch is being applied >> to the same function. The live patch code need to ask the bpf trampoline to >> prepare the trampoline before live patch. This is done by calling >> bpf_tramp_ftrace_ops_func. >> >> 2.1 If nothing else is modifying the trampoline at the same time, >> bpf_tramp_ftrace_ops_func will succeed. >> >> 2.2 In rare cases, if something else is holding tr->mutex to make changes to >> the trampoline (add/remove fentry functions, etc.), mutex_trylock in >> bpf_tramp_ftrace_ops_func will fail, and live patch will fail. However, the >> change to BPF trampoline will still succeed. It is common for live patch to >> retry, so we just need to try live patch again when no one is making changes >> to the BPF trampoline in parallel. > > If the live patch is going to try again, and the task doing the live > patch is SCHED_FIFO, and the task holding the tr->mutex is SCHED_OTHER > (or just a lower priority), then there is a chance that the live patch > task preempted the tr->mutex owner, and let's say the tr->mutex owner > is pinned to the CPU (by the user or whatever), then because the live > patch task is in a loop trying to take that mutex, it will never let > the owner continue. Yeah, I got this scenario. I just don't think we should run live patch with high priority. Well, maybe we shouldn't make such assumptions. > > Yes, this is a real scenario with trylock on mutexes. We hit it all the > time in RT. > >> >>> >>>> >>>> Since both livepatch and bpf trampoline changes are rare operations, I think >>>> the chance of the race condition is low enough. > > > A low race condition in a world that does this a billion times a day, > ends up being not so rare. > > I like to say, "I live in a world where the unlikely is very much likely!" > > >>>> >>>> Does this make sense? >>>> >>> >>> It's low, and if it is also a privileged operation then there's less to be >>> concern about. >> >> Both live patch and BPF trampoline are privileged operations. > > This makes the issue less of an issue, but if there's an application > that does this with setuid or something, there's a chance that it can > be used by an attacker as well. > >> >>> As if it is not, then we could have a way to deadlock the >>> system. I'm more concerned that this will lead to a CVE than it just >>> happening randomly. In other words, it only takes something that can run at >>> a real-time priority to connect to a live patch location, and something >>> that runs at a low priority to take a tr->mutex. If an attacker has both, >>> then it can pin both to a CPU and then cause the deadlock to the system. >>> >>> One hack to fix this is to add a msleep(1) in the failed case of the >>> trylock. This will at least give the owner of the lock a millisecond to >>> release it. This was what the RT patch use to do with spin_trylock() that >>> was converted to a mutex (and we worked hard to remove all of them). >> >> The fix is really simple. But I still think we don't need it. We only hit >> the trylock case for something with IPMODIFY. The non-privileged user >> should not be able to do that, right? > > For now, perhaps. But what useful applications are there going to be in > the future that performs these privileged operations on behalf of a > non-privileged user? > > In other words, if we can fix it now, we should, and avoid debugging > this issue 5 years from now where it may take months to figure out. Fair enough.. I guess we will just add the msleep(1) in the -EAGAIN path. If this sounds good, I will send v3 with this change and more comments. Thanks, Song