On 2024/3/7 6:10, Mina Almasry wrote: ... >>>>> +static int netdev_restart_rx_queue(struct net_device *dev, int rxq_idx) >>>>> +{ >>>>> + void *new_mem; >>>>> + void *old_mem; >>>>> + int err; >>>>> + >>>>> + if (!dev || !dev->netdev_ops) >>>>> + return -EINVAL; >>>>> + >>>>> + if (!dev->netdev_ops->ndo_queue_stop || >>>>> + !dev->netdev_ops->ndo_queue_mem_free || >>>>> + !dev->netdev_ops->ndo_queue_mem_alloc || >>>>> + !dev->netdev_ops->ndo_queue_start) >>>>> + return -EOPNOTSUPP; >>>>> + >>>>> + new_mem = dev->netdev_ops->ndo_queue_mem_alloc(dev, rxq_idx); >>>>> + if (!new_mem) >>>>> + return -ENOMEM; >>>>> + >>>>> + err = dev->netdev_ops->ndo_queue_stop(dev, rxq_idx, &old_mem); >>>>> + if (err) >>>>> + goto err_free_new_mem; >>>>> + >>>>> + err = dev->netdev_ops->ndo_queue_start(dev, rxq_idx, new_mem); >>>>> + if (err) >>>>> + goto err_start_queue; >>>>> + >>>>> + dev->netdev_ops->ndo_queue_mem_free(dev, old_mem); >>>>> + >>>>> + return 0; >>>>> + >>>>> +err_start_queue: >>>>> + dev->netdev_ops->ndo_queue_start(dev, rxq_idx, old_mem); >>>> >>>> It might worth mentioning why queue start with old_mem will always >>>> success here as the return value seems to be ignored here. >>>> >>> >>> So the old queue, we stopped it, and if we fail to bring up the new >>> queue, then we want to start the old queue back up to get the queue >>> back to a workable state. >>> >>> I don't see what we can do to recover if restarting the old queue >>> fails. Seems like it should be a requirement that the driver tries as >>> much as possible to keep the old queue restartable. >> >> Is it possible that we may have the 'old_mem' leaking if the driver >> fails to restart the old queue? how does the driver handle the >> firmware cmd failure for ndo_queue_start()? it seems a little >> tricky to implement it. >> > > I'm not sure what we can do to meaningfully recover from failure to > restarting the old queue, except log it so the error is visible. In > theory because we have not modifying any queue configurations > restarting it would be straight forward, but since it's dealing with > hardware then any failures are possible. Yes, we may need to have a clear semantics of how should the driver implement the above interface, for example if the driver should free the memory when fail to start a queue or the driver should restart the queue when fail to stop a queue? Otherwise we may have different driver implementing different behavior. >From the disscusion you mentioned below, does it make senses to modeling rdma subsystem by using create_queue/modify_queue/destroy_queue semantics instead? > >>> >>> I can improve this by at least logging or warning if restarting the >>> old queue fails. >> >> Also the semantics of the above function seems odd that it is not >> only restarting rx queue, but also freeing and allocating memory >> despite the name only suggests 'restart', I am a litte afraid that >> it may conflict with future usecae when user only need the >> 'restart' part, perhaps rename it to a more appropriate name. >> > > Oh, what we want here is just the 'restart' part. However, Jakub > mandates that if you restart a queue (or a driver), you do it like > this, hence the slightly more complicated implementation. > > https://patchwork.kernel.org/project/netdevbpf/patch/20231106024413.2801438-13-almasrymina@xxxxxxxxxx/#25590262 > https://lore.kernel.org/netdev/20230815171638.4c057dcd@xxxxxxxxxx/ Thanks for the link. I like david's idea of "a more generic design where H/W queues are created and destroyed - e.g., queues unique to a process which makes the cleanup so much easier." , but it seems it is a lot of work for networking to implement that for now. >