Hi, > -----Original Message----- > From: linux-arm-kernel > [mailto:linux-arm-kernel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of > Shameerali Kolothum Thodi > Sent: 18 September 2023 10:55 > To: Oliver Upton <oliver.upton@xxxxxxxxx> > Cc: kvmarm@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; maz@xxxxxxxxxx; will@xxxxxxxxxx; > catalin.marinas@xxxxxxx; james.morse@xxxxxxx; > suzuki.poulose@xxxxxxx; yuzenghui <yuzenghui@xxxxxxxxxx>; zhukeqian > <zhukeqian1@xxxxxxxxxx>; Jonathan Cameron > <jonathan.cameron@xxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx> > Subject: RE: [RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined > dirty log [...] > > > Please let me know if there is a specific workload you have in mind. > > > > No objection to the workload you've chosen, I'm more concerned about > the > > benchmark finishing before live migration completes. > > > > What I'm looking for is something like this: > > > > - Calculate the ops/sec your benchmark completes in steady state > > > > - Do a live migration and sample the rate throughout the benchmark, > > accounting for VM blackout time > > > > - Calculate the area under the curve of: > > > > y = steady_state_rate - live_migration_rate(t) > > > > - Compare the area under the curve for write-protection and your DBM > > approach. > > Ok. Got it. I attempted to benchmark the performance of this series better as suggested above. Used memcached/memaslap instead of redis-benchmark as this tool seems to dirty memory at a faster rate than redis-benchmark in my setup. ./memaslap -s 127.0.0.1:11211 -S 1s -F ./memslap.cnf -T 96 -c 96 -t 20m Please find the google sheet link below for the charts that compare the average throughput rates during the migration time window for 6.5-org and 6.5-kvm-dbm branch. https://docs.google.com/spreadsheets/d/1T2F94Lsjpx080hW8OSxwbTJXihbXDNlTE1HjWCC0J_4/edit?usp=sharing Sheet #1 : is with autoconverge=on with default settings(initial-throttle 20 & increment 10). As you can see from the charts, if you compare the kvm-dbm branch throughput during the migration window of original branch, it is considerably higher. But the convergence time to finish migration increases almost at the same rate for KVM-DBM. This in effect results in a decreased overall avg. throughput if we compare with the same time window of original branch. Sheet #2: is with autoconverge=on with throttle-increment set to 15 for kvm-dbm branch run. However, if we increase the migration throttling rate for kvm-dbm branch, it looks to me we can still have better throughput during the migration window time and also an overall higher throughput rate with KVM-DBM solution. Sheet: #3. Captures the dirty_log_perf_test times vs memory per vCPU. This is also in line with the above results. KVM-DBM has better/constant-ish dirty memory time compared to linear increase noted for original. But it is just the opposite for Get Dirty log time. >From the above, it looks to me there is a value addition in using HW DBM for write intensive workloads if we adjust the CPU throttling in the user space. Please take a look and let me know your feedback/thoughts. Thanks, Shameer