2022年12月27日(火) 23:37 Michael S. Tsirkin <mst@xxxxxxxxxx>: > > On Tue, Dec 27, 2022 at 07:22:36PM +0900, Shunsuke Mie wrote: > > 2022年12月27日(火) 16:49 Shunsuke Mie <mie@xxxxxxxxxx>: > > > > > > 2022年12月27日(火) 16:04 Michael S. Tsirkin <mst@xxxxxxxxxx>: > > > > > > > > On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote: > > > > > Each vringh memory accessors that are for user, kern and iotlb has own > > > > > interfaces that calls common code. But some codes are duplicated and that > > > > > becomes loss extendability. > > > > > > > > > > Introduce a struct vringh_ops and provide a common APIs for all accessors. > > > > > It can bee easily extended vringh code for new memory accessor and > > > > > simplified a caller code. > > > > > > > > > > Signed-off-by: Shunsuke Mie <mie@xxxxxxxxxx> > > > > > --- > > > > > drivers/vhost/vringh.c | 667 +++++++++++------------------------------ > > > > > include/linux/vringh.h | 100 +++--- > > > > > 2 files changed, 225 insertions(+), 542 deletions(-) > > > > > > > > > > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c > > > > > index aa3cd27d2384..ebfd3644a1a3 100644 > > > > > --- a/drivers/vhost/vringh.c > > > > > +++ b/drivers/vhost/vringh.c > > > > > @@ -35,15 +35,12 @@ static __printf(1,2) __cold void vringh_bad(const char *fmt, ...) > > > > > } > > > > > > > > > > /* Returns vring->num if empty, -ve on error. */ > > > > > -static inline int __vringh_get_head(const struct vringh *vrh, > > > > > - int (*getu16)(const struct vringh *vrh, > > > > > - u16 *val, const __virtio16 *p), > > > > > - u16 *last_avail_idx) > > > > > +static inline int __vringh_get_head(const struct vringh *vrh, u16 *last_avail_idx) > > > > > { > > > > > u16 avail_idx, i, head; > > > > > int err; > > > > > > > > > > - err = getu16(vrh, &avail_idx, &vrh->vring.avail->idx); > > > > > + err = vrh->ops.getu16(vrh, &avail_idx, &vrh->vring.avail->idx); > > > > > if (err) { > > > > > vringh_bad("Failed to access avail idx at %p", > > > > > &vrh->vring.avail->idx); > > > > > > > > I like that this patch removes more lines of code than it adds. > > > > > > > > However one of the design points of vringh abstractions is that they were > > > > carefully written to be very low overhead. > > > > This is why we are passing function pointers to inline functions - > > > > compiler can optimize that out. > > > > > > > > I think that introducing ops indirect functions calls here is going to break > > > > these assumptions and hurt performance. > > > > Unless compiler can somehow figure it out and optimize? > > > > I don't see how it's possible with ops pointer in memory > > > > but maybe I'm wrong. > > > I think your concern is correct. I have to understand the compiler > > > optimization and redesign this approach If it is needed. > > > > Was any effort taken to test effect of these patches on performance? > > > I just tested vringh_test and already faced little performance reduction. > > > I have to investigate that, as you said. > > I attempted to test with perf. I found that the performance of patched code > > is almost the same as the upstream one. However, I have to investigate way > > this patch leads to this result, also the profiling should be run on > > more powerful > > machines too. > > > > environment: > > $ grep 'model name' /proc/cpuinfo > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > model name : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz > > > > results: > > * for patched code > > Performance counter stats for 'nice -n -20 ./vringh_test_patched > > --parallel --eventidx --fast-vringh --indirect --virtio-1' (20 runs): > > > > 3,028.05 msec task-clock # 0.995 CPUs > > utilized ( +- 0.12% ) > > 78,150 context-switches # 25.691 K/sec > > ( +- 0.00% ) > > 5 cpu-migrations # 1.644 /sec > > ( +- 3.33% ) > > 190 page-faults # 62.461 /sec > > ( +- 0.41% ) > > 6,919,025,222 cycles # 2.275 GHz > > ( +- 0.13% ) > > 8,990,220,160 instructions # 1.29 insn per > > cycle ( +- 0.04% ) > > 1,788,326,786 branches # 587.899 M/sec > > ( +- 0.05% ) > > 4,557,398 branch-misses # 0.25% of all > > branches ( +- 0.43% ) > > > > 3.04359 +- 0.00378 seconds time elapsed ( +- 0.12% ) > > > > * for upstream code > > Performance counter stats for 'nice -n -20 ./vringh_test_base > > --parallel --eventidx --fast-vringh --indirect --virtio-1' (10 runs): > > > > 3,058.41 msec task-clock # 0.999 CPUs > > utilized ( +- 0.14% ) > > 78,149 context-switches # 25.545 K/sec > > ( +- 0.00% ) > > 5 cpu-migrations # 1.634 /sec > > ( +- 2.67% ) > > 194 page-faults # 63.414 /sec > > ( +- 0.43% ) > > 6,988,713,963 cycles # 2.284 GHz > > ( +- 0.14% ) > > 8,512,533,269 instructions # 1.22 insn per > > cycle ( +- 0.04% ) > > 1,638,375,371 branches # 535.549 M/sec > > ( +- 0.05% ) > > 4,428,866 branch-misses # 0.27% of all > > branches ( +- 22.57% ) > > > > 3.06085 +- 0.00420 seconds time elapsed ( +- 0.14% ) > > > How you compiled it also matters. ATM we don't enable retpolines > and it did not matter since we didn't have indirect calls, > but we should. Didn't yet investigate how to do that for virtio tools. I think the retpolines certainly affect performance. Thank you for pointing it out. I'd like to start the investigation that how to apply the retpolines to the virtio tools. > > > Thank you for your comments. > > > > Thanks! > > > > > > > > > > > Best, > > > Shunsuke. > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization