On Mon, Dec 24, 2018 at 04:44:14PM +0800, Jason Wang wrote: > > On 2018/12/17 上午3:57, Michael S. Tsirkin wrote: > > On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote: > > > From: Jason Wang <jasowang@xxxxxxxxxx> > > > Date: Fri, 14 Dec 2018 12:29:54 +0800 > > > > > > > On 2018/12/14 上午4:12, Michael S. Tsirkin wrote: > > > > > On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote: > > > > > > Hi: > > > > > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > address instead of copy_user() friends since they had too much > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > toggling. > > > > > > > > > > > > Test shows about 24% improvement on TX PPS. It should benefit other > > > > > > cases as well. > > > > > > > > > > > > Please review > > > > > I think the idea of speeding up userspace access is a good one. > > > > > However I think that moving all checks to start is way too aggressive. > > > > > > > > So did packet and AF_XDP. Anyway, sharing address space and access > > > > them directly is the fastest way. Performance is the major > > > > consideration for people to choose backend. Compare to userspace > > > > implementation, vhost does not have security advantages at any > > > > level. If vhost is still slow, people will start to develop backends > > > > based on e.g AF_XDP. > > > Exactly, this is precisely how this kind of problem should be solved. > > > > > > Michael, I strongly support the approach Jason is taking here, and I > > > would like to ask you to seriously reconsider your objections. > > > > > > Thank you. > > Okay. Won't be the first time I'm wrong. > > > > Let's say we ignore security aspects, but we need to make sure the > > following all keep working (broken with this revision): > > - file backed memory (I didn't see where we mark memory dirty - > > if we don't we get guest memory corruption on close, if we do > > then host crash as https://lwn.net/Articles/774411/ seems to apply here?) > > > We only pin metadata pages, so I don't think they can be used for DMA. So it > was probably not an issue. The real issue is zerocopy codes, maybe it's time > to disable it by default? > > > > - THP > > > We will miss 2 or 4 pages for THP, I wonder whether or not it's measurable. > > > > - auto-NUMA > > > I'm not sure auto-NUMA will help for the case of IPC. It can damage the > performance in the worst case if vhost and userspace are running in two > different nodes. Anyway I can measure. > > > > > > Because vhost isn't like AF_XDP where you can just tell people "use > > hugetlbfs" and "data is removed on close" - people are using it in lots > > of configurations with guest memory shared between rings and unrelated > > data. > > > This series doesn't share data, only metadata is shared. Let me clarify - I mean that metadata is in same huge page with unrelated guest data. > > > > > Jason, thoughts on these? > > > > Based on the above, I can measure the impact of THP to see how it impacts. > > For unsafe variants, it can only work for when we can batch the access and > it needs non trivial rework on the vhost codes with unexpected amount of > work for archs other than x86. I'm not sure it's worth to try. > > Thanks Yes I think we need better APIs in vhost. Right now we have an API to get and translate a single buffer. We should have one that gets a batch of descriptors and stores it, then one that translates this batch. IMHO this will benefit everyone even if we do vmap due to better code locality. -- MST