On Thu, 11 May 2023 21:54:40 -0400, Feng Liu <feliu@xxxxxxxxxx> wrote: > > > On 2023-05-10 a.m.1:00, Jason Wang wrote: > > External email: Use caution opening links or attachments > > > > > > 在 2023/5/9 09:43, Xuan Zhuo 写道: > >> On Mon, 8 May 2023 11:00:10 -0400, Feng Liu <feliu@xxxxxxxxxx> wrote: > >>> > >>> On 2023-05-07 p.m.9:45, Xuan Zhuo wrote: > >>>> External email: Use caution opening links or attachments > >>>> > >>>> > >>>> On Sat, 6 May 2023 08:08:02 -0400, Feng Liu <feliu@xxxxxxxxxx> wrote: > >>>>> > >>>>> On 2023-05-05 p.m.10:33, Xuan Zhuo wrote: > >>>>>> External email: Use caution opening links or attachments > >>>>>> > >>>>>> > >>>>>> On Tue, 2 May 2023 20:35:25 -0400, Feng Liu <feliu@xxxxxxxxxx> wrote: > >>>>>>> When initializing XDP in virtnet_open(), some rq xdp initialization > >>>>>>> may hit an error causing net device open failed. However, previous > >>>>>>> rqs have already initialized XDP and enabled NAPI, which is not the > >>>>>>> expected behavior. Need to roll back the previous rq initialization > >>>>>>> to avoid leaks in error unwinding of init code. > >>>>>>> > >>>>>>> Also extract a helper function of disable queue pairs, and use newly > >>>>>>> introduced helper function in error unwinding and virtnet_close; > >>>>>>> > >>>>>>> Issue: 3383038 > >>>>>>> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") > >>>>>>> Signed-off-by: Feng Liu <feliu@xxxxxxxxxx> > >>>>>>> Reviewed-by: William Tu <witu@xxxxxxxxxx> > >>>>>>> Reviewed-by: Parav Pandit <parav@xxxxxxxxxx> > >>>>>>> Reviewed-by: Simon Horman <simon.horman@xxxxxxxxxxxx> > >>>>>>> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > >>>>>>> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f > >>>>>>> --- > >>>>>>> drivers/net/virtio_net.c | 30 ++++++++++++++++++++---------- > >>>>>>> 1 file changed, 20 insertions(+), 10 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>>>> index 8d8038538fc4..3737cf120cb7 100644 > >>>>>>> --- a/drivers/net/virtio_net.c > >>>>>>> +++ b/drivers/net/virtio_net.c > >>>>>>> @@ -1868,6 +1868,13 @@ static int virtnet_poll(struct napi_struct > >>>>>>> *napi, int budget) > >>>>>>> return received; > >>>>>>> } > >>>>>>> > >>>>>>> +static void virtnet_disable_qp(struct virtnet_info *vi, int > >>>>>>> qp_index) > >>>>>>> +{ > >>>>>>> + virtnet_napi_tx_disable(&vi->sq[qp_index].napi); > >>>>>>> + napi_disable(&vi->rq[qp_index].napi); > >>>>>>> + xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq); > >>>>>>> +} > >>>>>>> + > >>>>>>> static int virtnet_open(struct net_device *dev) > >>>>>>> { > >>>>>>> struct virtnet_info *vi = netdev_priv(dev); > >>>>>>> @@ -1883,20 +1890,26 @@ static int virtnet_open(struct net_device > >>>>>>> *dev) > >>>>>>> > >>>>>>> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, > >>>>>>> i, vi->rq[i].napi.napi_id); > >>>>>>> if (err < 0) > >>>>>>> - return err; > >>>>>>> + goto err_xdp_info_reg; > >>>>>>> > >>>>>>> err = > >>>>>>> xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, > >>>>>>> > >>>>>>> MEM_TYPE_PAGE_SHARED, NULL); > >>>>>>> - if (err < 0) { > >>>>>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>>>>> - return err; > >>>>>>> - } > >>>>>>> + if (err < 0) > >>>>>>> + goto err_xdp_reg_mem_model; > >>>>>>> > >>>>>>> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); > >>>>>>> virtnet_napi_tx_enable(vi, vi->sq[i].vq, > >>>>>>> &vi->sq[i].napi); > >>>>>>> } > >>>>>>> > >>>>>>> return 0; > >>>>>>> + > >>>>>>> +err_xdp_reg_mem_model: > >>>>>>> + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>>>>> +err_xdp_info_reg: > >>>>>>> + for (i = i - 1; i >= 0; i--) > >>>>>>> + virtnet_disable_qp(vi, i); > >>>>>> > >>>>>> I would to know should we handle for these: > >>>>>> > >>>>>> disable_delayed_refill(vi); > >>>>>> cancel_delayed_work_sync(&vi->refill); > >>>>>> > >>>>>> > >>>>>> Maybe we should call virtnet_close() with "i" directly. > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> > >>>>> Can’t use i directly here, because if xdp_rxq_info_reg fails, napi has > >>>>> not been enabled for current qp yet, I should roll back from the queue > >>>>> pairs where napi was enabled before(i--), otherwise it will hang at > >>>>> napi > >>>>> disable api > >>>> This is not the point, the key is whether we should handle with: > >>>> > >>>> disable_delayed_refill(vi); > >>>> cancel_delayed_work_sync(&vi->refill); > >>>> > >>>> Thanks. > >>>> > >>>> > >>> OK, get the point. Thanks for your careful review. And I check the code > >>> again. > >>> > >>> There are two points that I need to explain: > >>> > >>> 1. All refill delay work calls(vi->refill, vi->refill_enabled) are based > >>> on that the virtio interface is successfully opened, such as > >>> virtnet_receive, virtnet_rx_resize, _virtnet_set_queues, etc. If there > >>> is an error in the xdp reg here, it will not trigger these subsequent > >>> functions. There is no need to call disable_delayed_refill() and > >>> cancel_delayed_work_sync(). > >> Maybe something is wrong. I think these lines may call delay work. > >> > >> static int virtnet_open(struct net_device *dev) > >> { > >> struct virtnet_info *vi = netdev_priv(dev); > >> int i, err; > >> > >> enable_delayed_refill(vi); > >> > >> for (i = 0; i < vi->max_queue_pairs; i++) { > >> if (i < vi->curr_queue_pairs) > >> /* Make sure we have some buffers: if oom use > >> wq. */ > >> --> if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) > >> --> schedule_delayed_work(&vi->refill, 0); > >> > >> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, > >> vi->rq[i].napi.napi_id); > >> if (err < 0) > >> return err; > >> > >> err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, > >> MEM_TYPE_PAGE_SHARED, > >> NULL); > >> if (err < 0) { > >> xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >> return err; > >> } > >> > >> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); > >> virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); > >> } > >> > >> return 0; > >> } > >> > >> > >> And I think, if we virtnet_open() return error, then the status of > >> virtnet > >> should like the status after virtnet_close(). > >> > >> Or someone has other opinion. > > > > > > I agree, we need to disable and sync with the refill work. > > > > Thanks > > > > > Hi, Jason & Xuan > > I will modify the patch according to the comments. > > But cannot call virtnet_close(), since virtnet_close cannot disable > queue pairs from the specified error one. so still need to use disable > helper function. The reason is as mentioned in the previous email, we > need to roll back from the specified error queue, otherwise the queue > pairs which has not been enabled napi will hang up at napi disable api. > > According to the comments, I will call disable_delayed_refill() and > cancel_delayed_work_sync() in error unwinding, then call the disable > helper function one by one for the queue pairs before the error one. > > Do you have any other comments about these? LGTM Thanks. > > Thanks > > >> > >> Thanks. > >> > >>> The logic here is different from that of > >>> virtnet_close. virtnet_close is based on the success of virtnet_open and > >>> the tx and rx has been carried out normally. For error unwinding, only > >>> disable qp is needed. Also encapuslated a helper function of disable qp, > >>> which is used ing error unwinding and virtnet close > >>> 2. The current error qp, which has not enabled NAPI, can only call xdp > >>> unreg, and cannot call the interface of disable NAPI, otherwise the > >>> kernel will be stuck. So for i-- the reason for calling disable qp on > >>> the previous queue > >>> > >>> Thanks > >>> > >>>>>>> + > >>>>>>> + return err; > >>>>>>> } > >>>>>>> > >>>>>>> static int virtnet_poll_tx(struct napi_struct *napi, int budget) > >>>>>>> @@ -2305,11 +2318,8 @@ static int virtnet_close(struct net_device > >>>>>>> *dev) > >>>>>>> /* Make sure refill_work doesn't re-enable napi! */ > >>>>>>> cancel_delayed_work_sync(&vi->refill); > >>>>>>> > >>>>>>> - for (i = 0; i < vi->max_queue_pairs; i++) { > >>>>>>> - virtnet_napi_tx_disable(&vi->sq[i].napi); > >>>>>>> - napi_disable(&vi->rq[i].napi); > >>>>>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>>>>> - } > >>>>>>> + for (i = 0; i < vi->max_queue_pairs; i++) > >>>>>>> + virtnet_disable_qp(vi, i); > >>>>>>> > >>>>>>> return 0; > >>>>>>> } > >>>>>>> -- > >>>>>>> 2.37.1 (Apple Git-137.1) > >>>>>>> > >