On 2018/07/02 16:52, Jason Wang wrote: > On 2018年07月02日 15:11, Toshiaki Makita wrote: >> On 2018/07/02 15:17, Jason Wang wrote: >>> On 2018年07月02日 12:37, Toshiaki Makita wrote: >>>> On 2018/07/02 11:54, Jason Wang wrote: >>>>> On 2018年07月02日 10:45, Toshiaki Makita wrote: >>>>>> Hi Jason, >>>>>> >>>>>> On 2018/06/29 18:30, Jason Wang wrote: >>>>>>> On 2018年06月29日 16:09, Toshiaki Makita wrote: >>>>>> ... >>>>>>>> To fix this, poll the work instead of enabling notification when >>>>>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>>>>> completely cancel the busypoll, so let's run busypoll after the >>>>>>>> necessary work is done. In order to avoid too long busyloop due to >>>>>>>> interruption, save the endtime in vq field and use it when >>>>>>>> reentering >>>>>>>> the work function. >>>>>>> I think we don't care long busyloop unless e.g tx can starve rx? >>>>>> I just want to keep it user-controllable. Unless memorizing it >>>>>> busypoll >>>>>> can run unexpectedly long. >>>>> I think the total amount of time for busy polling is bounded. If I was >>>>> wrong, it should be a bug somewhere. >>>> Consider this kind of scenario: >>>> 0. Set 100us busypoll for example. >>>> 1. handle_tx() runs busypoll. >>>> 2. Something like zerocopy queues tx_work within 100us. >>>> 3. busypoll exits and call handle_tx() again. >>>> 4. Repeat 1-3. >>>> >>>> In this case handle_tx() does not process packets but busypoll >>>> essentially runs beyond 100us without endtime memorized. This may be >>>> just a theoretical problem, but I was worried that more code to poll tx >>>> queue can be added in the future and it becomes realistic. >>> Yes, but consider zerocopy tends to batch 16 used packets and we will >>> finally finish all processing of packets. The above won't be endless, so >>> it was probably tolerable. >> Right. So endtime memorization is more like a future-proof thing. >> Would you like to keep it or change something? > > I think we'd better introduce it only when we meet real bugs. I'll change it to a flag to indicate the previous busypoll is interrupted. >>>>>>>> Performance numbers: >>>>>>>> >>>>>>>> - Bulk transfer from guest to external physical server. >>>>>>>> [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>>>>> [Server] >>>>>>> Just to confirm in this case since zerocopy is enabled, we are in >>>>>>> fact >>>>>>> use the generic XDP datapath? >>>>>> For some reason zerocopy was not applied for most packets, so in most >>>>>> cases driver XDP was used. I was going to dig into it but not yet. >>>>> Right, just to confirm this. This is expected. >>>>> >>>>> In tuntap, we do native XDP only for small and non zerocopy >>>>> packets. See >>>>> tun_can_build_skb(). The reason is XDP may adjust packet header >>>>> which is >>>>> not supported by zercopy. We can only use XDP generic for zerocopy in >>>>> this case. >>>> I think I understand when driver XDP can be used. What I'm not sure and >>>> was going to narrow down is why zerocopy is mostly not applied. >>>> >>> I see, any touch to the zerocopy packet (clone, header expansion or >>> segmentation) that lead a userspace copy will increase the error counter >>> in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use >>> zerocopy. So it was probably something in your setup or a bug somewhere. >> Thanks for the hint! Seems zerocopy packets are always nonlinear and netif_receive_generic_xdp() calls skb_linearize() in which __pskb_pull_tail() calls skb_zcopy_clear(). Looks like tx_zcopy_err is always counted when zerocopy is used with XDP in my env. -- Toshiaki Makita