On Wed, 2017-08-23 at 23:28 +0900, Koichiro Den wrote: > On Tue, 2017-08-22 at 20:55 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 22, 2017 at 10:50:41AM +0800, Jason Wang wrote: > > > > Perhaps the descriptor pool should also be > > > > revised to allow out of order completions. Then there is no need to > > > > copy zerocopy packets whenever they may experience delay. > > > > > > Yes, but as replied in the referenced thread, windows driver may treat out > > > of order completion as a bug. > > > > That would be a windows driver bug then, but I don't think it makes this > > assumption. What the referenced thread > > (https://patchwork.kernel.org/patch/3787671/) is saying is that host > > must use any buffers made available on a tx vq within a reasonable > > timeframe otherwise windows guests panic. > > > > Ideally we would detect that a packet is actually experiencing delay and > > trigger the copy at that point e.g. by calling skb_linearize. But it > > isn't easy to track these packets though and even harder to do a data > > copy without races. > > > > Which reminds me that skb_linearize in net core seems to be > > fundamentally racy - I suspect that if skb is cloned, and someone is > > trying to use the shared frags while another thread calls skb_linearize, > > we get some use after free bugs which likely mostly go undetected > > because the corrupted packets mostly go on wire and get dropped > > by checksum code. > > > > Please let me make sure if I understand it correctly: > * always do copy with skb_orphan_frags_rx as Willem mentioned in the earlier > post, before the xmit_skb as opposed to my original patch, is safe but too > costly so cannot be adopted. > * as a generic solution, if we were to somehow overcome the safety issue, > track > the delay and do copy if some threshold is reached could be an answer, but > it's > hard for now. > * so things like the current vhost-net implementation of deciding whether or > not > to do zerocopy beforehand referring the zerocopy tx error ratio is a point of > practical compromise. <- I forgot to mention the max pend checking part. > > Thanks. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization