Re: [RFC PATCH net-next v6 05/15] netdev: support binding dma-buf to netdevice

Yunsheng Lin <linyunsheng@xxxxxxxxxx> · Thu, 7 Mar 2024 20:15:26 +0800

On 2024/3/7 6:10, Mina Almasry wrote:

...

>>>>> +static int netdev_restart_rx_queue(struct net_device *dev, int rxq_idx)
>>>>> +{
>>>>> +     void *new_mem;
>>>>> +     void *old_mem;
>>>>> +     int err;
>>>>> +
>>>>> +     if (!dev || !dev->netdev_ops)
>>>>> +             return -EINVAL;
>>>>> +
>>>>> +     if (!dev->netdev_ops->ndo_queue_stop ||
>>>>> +         !dev->netdev_ops->ndo_queue_mem_free ||
>>>>> +         !dev->netdev_ops->ndo_queue_mem_alloc ||
>>>>> +         !dev->netdev_ops->ndo_queue_start)
>>>>> +             return -EOPNOTSUPP;
>>>>> +
>>>>> +     new_mem = dev->netdev_ops->ndo_queue_mem_alloc(dev, rxq_idx);
>>>>> +     if (!new_mem)
>>>>> +             return -ENOMEM;
>>>>> +
>>>>> +     err = dev->netdev_ops->ndo_queue_stop(dev, rxq_idx, &old_mem);
>>>>> +     if (err)
>>>>> +             goto err_free_new_mem;
>>>>> +
>>>>> +     err = dev->netdev_ops->ndo_queue_start(dev, rxq_idx, new_mem);
>>>>> +     if (err)
>>>>> +             goto err_start_queue;
>>>>> +
>>>>> +     dev->netdev_ops->ndo_queue_mem_free(dev, old_mem);
>>>>> +
>>>>> +     return 0;
>>>>> +
>>>>> +err_start_queue:
>>>>> +     dev->netdev_ops->ndo_queue_start(dev, rxq_idx, old_mem);
>>>>
>>>> It might worth mentioning why queue start with old_mem will always
>>>> success here as the return value seems to be ignored here.
>>>>
>>>
>>> So the old queue, we stopped it, and if we fail to bring up the new
>>> queue, then we want to start the old queue back up to get the queue
>>> back to a workable state.
>>>
>>> I don't see what we can do to recover if restarting the old queue
>>> fails. Seems like it should be a requirement that the driver tries as
>>> much as possible to keep the old queue restartable.
>>
>> Is it possible that we may have the 'old_mem' leaking if the driver
>> fails to restart the old queue? how does the driver handle the
>> firmware cmd failure for ndo_queue_start()? it seems a little
>> tricky to implement it.
>>
> 
> I'm not sure what we can do to meaningfully recover from failure to
> restarting the old queue, except log it so the error is visible. In
> theory because we have not modifying any queue configurations
> restarting it would be straight forward, but since it's dealing with
> hardware then any failures are possible.

Yes, we may need to have a clear semantics of how should the driver
implement the above interface, for example if the driver should free
the memory when fail to start a queue or the driver should restart
the queue when fail to stop a queue? Otherwise we may have different
driver implementing different behavior.

>From the disscusion you mentioned below, does it make senses to
modeling rdma subsystem by using create_queue/modify_queue/destroy_queue
semantics instead?

> 
>>>
>>> I can improve this by at least logging or warning if restarting the
>>> old queue fails.
>>
>> Also the semantics of the above function seems odd that it is not
>> only restarting rx queue, but also freeing and allocating memory
>> despite the name only suggests 'restart', I am a litte afraid that
>> it may conflict with future usecae when user only need the
>> 'restart' part, perhaps rename it to a more appropriate name.
>>
> 
> Oh, what we want here is just the 'restart' part. However, Jakub
> mandates that if you restart a queue (or a driver), you do it like
> this, hence the slightly more complicated implementation.
> 
> https://patchwork.kernel.org/project/netdevbpf/patch/20231106024413.2801438-13-almasrymina@xxxxxxxxxx/#25590262
> https://lore.kernel.org/netdev/20230815171638.4c057dcd@xxxxxxxxxx/

Thanks for the link.

I like david's idea of "a more generic design where H/W queues are created
and destroyed - e.g., queues unique to a process which makes the cleanup
so much easier." , but it seems it is a lot of work for networking to
implement that for now.

>