On Fri, Sep 27, 2024 at 6:32 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote: > > On Fri, Sep 27, 2024 at 06:15:40PM +0200, Alice Ryhl wrote: > > On Fri, Sep 27, 2024 at 6:13 PM Yu-Ting Tseng <yutingtseng@xxxxxxxxxx> wrote: > > > > > > On Fri, Sep 27, 2024 at 12:19 AM Alice Ryhl <aliceryhl@xxxxxxxxxx> wrote: > > > > > > > > On Fri, Sep 27, 2024 at 1:37 AM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote: > > > > > > > > > > Alice points out that binder_request_freeze_notification() should not > > > > > return EINVAL when the relevant node is dead [1]. The node can die at > > > > > any point even if the user input is valid. Instead, allow the request > > > > > to be allocated but skip the initial notification for dead nodes. This > > > > > avoids propagating unnecessary errors back to userspace. > > > > > > > > > > Fixes: d579b04a52a1 ("binder: frozen notification") > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > Suggested-by: Alice Ryhl <aliceryhl@xxxxxxxxxx> > > > > > Link: https://lore.kernel.org/all/CAH5fLghapZJ4PbbkC8V5A6Zay-_sgTzwVpwqk6RWWUNKKyJC_Q@xxxxxxxxxxxxxx/ [1] > > > > > Signed-off-by: Carlos Llamas <cmllamas@xxxxxxxxxx> > > > > > --- > > > > > drivers/android/binder.c | 28 +++++++++++++--------------- > > > > > 1 file changed, 13 insertions(+), 15 deletions(-) > > > > > > > > > > diff --git a/drivers/android/binder.c b/drivers/android/binder.c > > > > > index 73dc6cbc1681..415fc9759249 100644 > > > > > --- a/drivers/android/binder.c > > > > > +++ b/drivers/android/binder.c > > > > > @@ -3856,7 +3856,6 @@ binder_request_freeze_notification(struct binder_proc *proc, > > > > > { > > > > > struct binder_ref_freeze *freeze; > > > > > struct binder_ref *ref; > > > > > - bool is_frozen; > > > > > > > > > > freeze = kzalloc(sizeof(*freeze), GFP_KERNEL); > > > > > if (!freeze) > > > > > @@ -3872,32 +3871,31 @@ binder_request_freeze_notification(struct binder_proc *proc, > > > > > } > > > > > > > > > > binder_node_lock(ref->node); > > > > > - > > > > > - if (ref->freeze || !ref->node->proc) { > > > > > - binder_user_error("%d:%d invalid BC_REQUEST_FREEZE_NOTIFICATION %s\n", > > > > > - proc->pid, thread->pid, > > > > > - ref->freeze ? "already set" : "dead node"); > > > > > + if (ref->freeze) { > > > > > + binder_user_error("%d:%d BC_REQUEST_FREEZE_NOTIFICATION already set\n", > > > > > + proc->pid, thread->pid); > > > > > binder_node_unlock(ref->node); > > > > > binder_proc_unlock(proc); > > > > > kfree(freeze); > > > > > return -EINVAL; > > > > > } > > > > > - binder_inner_proc_lock(ref->node->proc); > > > > > - is_frozen = ref->node->proc->is_frozen; > > > > > - binder_inner_proc_unlock(ref->node->proc); > > > > > > > > > > binder_stats_created(BINDER_STAT_FREEZE); > > > > > INIT_LIST_HEAD(&freeze->work.entry); > > > > > freeze->cookie = handle_cookie->cookie; > > > > > freeze->work.type = BINDER_WORK_FROZEN_BINDER; > > > > > - freeze->is_frozen = is_frozen; > > > > > - > > > > > ref->freeze = freeze; > > > > > > > > > > - binder_inner_proc_lock(proc); > > > > > - binder_enqueue_work_ilocked(&ref->freeze->work, &proc->todo); > > > > > - binder_wakeup_proc_ilocked(proc); > > > > > - binder_inner_proc_unlock(proc); > > > > > + if (ref->node->proc) { > > > > > + binder_inner_proc_lock(ref->node->proc); > > > > > + freeze->is_frozen = ref->node->proc->is_frozen; > > > > > + binder_inner_proc_unlock(ref->node->proc); > > > > > + > > > > > + binder_inner_proc_lock(proc); > > > > > + binder_enqueue_work_ilocked(&freeze->work, &proc->todo); > > > > > + binder_wakeup_proc_ilocked(proc); > > > > > + binder_inner_proc_unlock(proc); > > > > > > > > This is not a problem with your change ... but, why exactly are we > > > > scheduling the BINDER_WORK_FROZEN_BINDER right after creating it? For > > > > death notications, we only schedule it immediately if the process is > > > > dead. So shouldn't we only schedule it if the process is not frozen? > > For death notifications, we only care about a remote binder's death. > Unlike freeze, in which we have a state that can toggle at any point. > This is important for suspending and resuming transactions to a node. > > Sending the freeze notification immediately allows for (1) userspace > knowing the current state of the remote node and (2) avoiding a race > with BINDER_FREEZE ioctl in which we could miss a freeze/thaw. > > > > > And if the answer is that frozen notifications are always sent > > > > immediately to notify about the current state, then we should also > > > > send one for a dead process ... maybe. I guess a dead process is not > > > > frozen? > > > Yes this is to immediately notify about the current state (frozen or > > > unfrozen). A dead process is in neither state so it feels more correct > > > not to send either? > > > > Okay. > > > > On the other hand, I can easily imagine userspace code being written > > with the assumption that it'll always get a notification immediately. > > That would probably result in deadlocks in the edge case where the > > process happens to be dead. > > There are different ways to proceed with this dead node scenario: > > 1. return ESRCH > 2. silently fail and don't allocate a ref->freeze > 3. allocate a ref->freeze but don't notify the current state > 4. allocate and send a "fake" state notification. > > I like 1 just because it is technically the correct thing to do from the > driver's perspective. However, it does complicate things in userspace as > we've discussed. Option 2, could work but it would also fail with EINVAL > if a "clear notification" is sent later anyway. Option 3 changes the > behavior of guaranteeing a notification upon success. Option 4 can cause > trouble on how a "not-frozen" notification is handled in userspace e.g > start sending transactions. > > As you can see there is no clear winner here, we have to compromise > something and option #3 is the best we can do IMO. I am happy with both #3 and #4. I think #1 and #2 are problematic because they will lead to userspace getting errors on correct use of Binder. Alice