On Tue, Sep 24, 2024 at 8:44 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote: > > In binder_add_freeze_work() we iterate over the proc->nodes with the > proc->inner_lock held. However, this lock is temporarily dropped to > acquire the node->lock first (lock nesting order). This can race with > binder_deferred_release() which removes the nodes from the proc->nodes > rbtree and adds them into binder_dead_nodes list. This leads to a broken > iteration in binder_add_freeze_work() as rb_next() will use data from > binder_dead_nodes, triggering an out-of-bounds access: > > ================================================================== > BUG: KASAN: global-out-of-bounds in rb_next+0xfc/0x124 > Read of size 8 at addr ffffcb84285f7170 by task freeze/660 > > CPU: 8 UID: 0 PID: 660 Comm: freeze Not tainted 6.11.0-07343-ga727812a8d45 #18 > Hardware name: linux,dummy-virt (DT) > Call trace: > rb_next+0xfc/0x124 > binder_add_freeze_work+0x344/0x534 > binder_ioctl+0x1e70/0x25ac > __arm64_sys_ioctl+0x124/0x190 > > The buggy address belongs to the variable: > binder_dead_nodes+0x10/0x40 > [...] > ================================================================== > > This is possible because proc->nodes (rbtree) and binder_dead_nodes > (list) share entries in binder_node through a union: > > struct binder_node { > [...] > union { > struct rb_node rb_node; > struct hlist_node dead_node; > }; > > Fix the race by checking that the proc is still alive. If not, simply > break out of the iteration. > > Fixes: d579b04a52a1 ("binder: frozen notification") > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Carlos Llamas <cmllamas@xxxxxxxxxx> This change LGTM. Reviewed-by: Alice Ryhl <aliceryhl@xxxxxxxxxx> I reviewed some other code paths to verify whether there are other problems with processes dying concurrently with operations on freeze notifications. I didn't notice any other memory safety issues, but I noticed that binder_request_freeze_notification returns EINVAL if you try to use it with a node from a dead process. That seems problematic, as this means that there's no way to invoke that command without risking an EINVAL error if the remote process dies. We should not return EINVAL errors on correct usage of the driver. Alice