On 2018/07/02 15:17, Jason Wang wrote: > On 2018年07月02日 12:37, Toshiaki Makita wrote: >> On 2018/07/02 11:54, Jason Wang wrote: >>> On 2018年07月02日 10:45, Toshiaki Makita wrote: >>>> Hi Jason, >>>> >>>> On 2018/06/29 18:30, Jason Wang wrote: >>>>> On 2018年06月29日 16:09, Toshiaki Makita wrote: >>>> ... >>>>>> To fix this, poll the work instead of enabling notification when >>>>>> busypoll is interrupted by something. IMHO signal_pending() and >>>>>> vhost_has_work() are kind of interruptions rather than signals to >>>>>> completely cancel the busypoll, so let's run busypoll after the >>>>>> necessary work is done. In order to avoid too long busyloop due to >>>>>> interruption, save the endtime in vq field and use it when reentering >>>>>> the work function. >>>>> I think we don't care long busyloop unless e.g tx can starve rx? >>>> I just want to keep it user-controllable. Unless memorizing it busypoll >>>> can run unexpectedly long. >>> I think the total amount of time for busy polling is bounded. If I was >>> wrong, it should be a bug somewhere. >> Consider this kind of scenario: >> 0. Set 100us busypoll for example. >> 1. handle_tx() runs busypoll. >> 2. Something like zerocopy queues tx_work within 100us. >> 3. busypoll exits and call handle_tx() again. >> 4. Repeat 1-3. >> >> In this case handle_tx() does not process packets but busypoll >> essentially runs beyond 100us without endtime memorized. This may be >> just a theoretical problem, but I was worried that more code to poll tx >> queue can be added in the future and it becomes realistic. > > Yes, but consider zerocopy tends to batch 16 used packets and we will > finally finish all processing of packets. The above won't be endless, so > it was probably tolerable. Right. So endtime memorization is more like a future-proof thing. Would you like to keep it or change something? >>>>>> Performance numbers: >>>>>> >>>>>> - Bulk transfer from guest to external physical server. >>>>>> [Guest]->vhost_net->tap--(XDP_REDIRECT)-->i40e --(wire)--> >>>>>> [Server] >>>>> Just to confirm in this case since zerocopy is enabled, we are in fact >>>>> use the generic XDP datapath? >>>> For some reason zerocopy was not applied for most packets, so in most >>>> cases driver XDP was used. I was going to dig into it but not yet. >>> Right, just to confirm this. This is expected. >>> >>> In tuntap, we do native XDP only for small and non zerocopy packets. See >>> tun_can_build_skb(). The reason is XDP may adjust packet header which is >>> not supported by zercopy. We can only use XDP generic for zerocopy in >>> this case. >> I think I understand when driver XDP can be used. What I'm not sure and >> was going to narrow down is why zerocopy is mostly not applied. >> > > I see, any touch to the zerocopy packet (clone, header expansion or > segmentation) that lead a userspace copy will increase the error counter > in vhost_net. Then vhost_net_tx_select_zcopy() may choose not to use > zerocopy. So it was probably something in your setup or a bug somewhere. Thanks for the hint! -- Toshiaki Makita