On Wed, Sep 25, 2024 at 7:48 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote: > > On Wed, Sep 25, 2024 at 10:02:51AM +0200, 'Alice Ryhl' via kernel-team wrote: > > On Tue, Sep 24, 2024 at 8:44 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote: > > > > > > In binder_add_freeze_work() we iterate over the proc->nodes with the > > > proc->inner_lock held. However, this lock is temporarily dropped to > > > acquire the node->lock first (lock nesting order). This can race with > > > binder_deferred_release() which removes the nodes from the proc->nodes > > > rbtree and adds them into binder_dead_nodes list. This leads to a broken > > > iteration in binder_add_freeze_work() as rb_next() will use data from > > > binder_dead_nodes, triggering an out-of-bounds access: > > > > > > ================================================================== > > > BUG: KASAN: global-out-of-bounds in rb_next+0xfc/0x124 > > > Read of size 8 at addr ffffcb84285f7170 by task freeze/660 > > > > > > CPU: 8 UID: 0 PID: 660 Comm: freeze Not tainted 6.11.0-07343-ga727812a8d45 #18 > > > Hardware name: linux,dummy-virt (DT) > > > Call trace: > > > rb_next+0xfc/0x124 > > > binder_add_freeze_work+0x344/0x534 > > > binder_ioctl+0x1e70/0x25ac > > > __arm64_sys_ioctl+0x124/0x190 > > > > > > The buggy address belongs to the variable: > > > binder_dead_nodes+0x10/0x40 > > > [...] > > > ================================================================== > > > > > > This is possible because proc->nodes (rbtree) and binder_dead_nodes > > > (list) share entries in binder_node through a union: > > > > > > struct binder_node { > > > [...] > > > union { > > > struct rb_node rb_node; > > > struct hlist_node dead_node; > > > }; > > > > > > Fix the race by checking that the proc is still alive. If not, simply > > > break out of the iteration. > > > > > > Fixes: d579b04a52a1 ("binder: frozen notification") > > > Cc: stable@xxxxxxxxxxxxxxx > > > Signed-off-by: Carlos Llamas <cmllamas@xxxxxxxxxx> > > > > This change LGTM. > > Reviewed-by: Alice Ryhl <aliceryhl@xxxxxxxxxx> > > > > I reviewed some other code paths to verify whether there are other > > problems with processes dying concurrently with operations on freeze > > notifications. I didn't notice any other memory safety issues, but I > > Yeah most other paths are protected with binder_procs_lock mutex. > > > noticed that binder_request_freeze_notification returns EINVAL if you > > try to use it with a node from a dead process. That seems problematic, > > as this means that there's no way to invoke that command without > > risking an EINVAL error if the remote process dies. We should not > > return EINVAL errors on correct usage of the driver. > > Agreed, this should probably be -ESRCH or something. I'll add it to v2, > thanks for the suggestion. Well, maybe? I think it's best to not return errnos from these commands at all, as they obscure how many commands were processed. Since the node still exists even if the process dies, perhaps we can just let you create the freeze notification even if it's dead? We can make it end up in the same state as if you request a freeze notification and the process then dies afterwards. Alice