On Wed, May 22, 2024 at 07:22:12AM -0700, Jakub Kicinski wrote: > On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote: > > > > 4. Add a new netlink notifier that when the relevant event takes place, > > > deletes the node from the list, wait until the end of the work item, with > > > cancel_work_sync() and free allocations. > > > > > > What's the "relevant event" in this case? Closing of the socket that user had > > > issued the command on? > > > > The event should match the below: > > event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC > > > > Then iterate over the list to look for work that matches the dev and portid. > > The socket doesn’t close until the work is done in that case. > > Okay, good, yes. I think you can use one of the callbacks I mentioned > below to achieve the same thing with less complexity than the notifier. Danielle already has a POC with the notifier and it's not that complicated. I wasn't aware of the netlink notifier, but we found it when we tried to understand how other netlink families get notified about a socket being closed. Which advantages do you see in the sock_priv_destroy() approach? Are you against the notifier approach? > > > Easiest way to "notice" the socket got closed would probably be to add some > > > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also > > > get a close notification in the family > > > ->unbind callback. Isn't the unbind callback only for multicast (whereas we are using unicast)? > > > > > > I'm on the fence whether we should cancel the work. We could just mark the > > > command as 'no socket present' and stop sending notifications. > > > Not sure which is better.. > > > > Is there a scenario that we hit this event and won't intend to cancel the work? > > I think it's up to us. I don't see any legit reason for user space to > intentionally cancel the flashing. So the only option is that user space > is either buggy or has crashed, and the socket got closed before > flashing finished. Right? We don't think that closing the socket / killing the process mid flashing is a legitimate scenario. We looked into it in order to avoid sending unicast notifications to a socket that did not ask for them but gets them because it was bound to the port ID that was used by the old socket. I agree that we don't need to cancel the work and can simply have the work item stop sending notifications. User space will get an error if it tries to flash a module that is already being flashed in the background. WDYT?