> -----Original Message----- > From: Oliver Upton [mailto:oliver.upton@xxxxxxxxx] > Sent: 15 September 2023 01:36 > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx> > Cc: kvmarm@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; maz@xxxxxxxxxx; will@xxxxxxxxxx; > catalin.marinas@xxxxxxx; james.morse@xxxxxxx; > suzuki.poulose@xxxxxxx; yuzenghui <yuzenghui@xxxxxxxxxx>; zhukeqian > <zhukeqian1@xxxxxxxxxx>; Jonathan Cameron > <jonathan.cameron@xxxxxxxxxx>; Linuxarm <linuxarm@xxxxxxxxxx> > Subject: Re: [RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined > dirty log > > On Thu, Sep 14, 2023 at 09:47:48AM +0000, Shameerali Kolothum Thodi > wrote: > > [...] > > > > What you're proposing here is complicated and I fear not easily > > > maintainable. Keeping the *two* sources of dirty state seems likely to > > > fail (eventually) with some very unfortunate consequences. > > > > It does adds complexity to the dirty state management code. I have tried > > to separate the code path using appropriate FLAGS etc to make it more > > manageable. But this is probably one area we can work on if the overall > > approach does have some benefits. > > I'd be a bit more amenable to a solution that would select either > write-protection or dirty state management, but not both. > > > > The vCPU:memory ratio you're testing doesn't seem representative of > what > > > a typical cloud provider would be configuring, and the dirty log > > > collection is going to scale linearly with the size of guest memory. > > > > I was limited by the test setup I had. I will give it a go with a higher mem > > system. > > Thanks. Dirty log collection needn't be single threaded, but the > fundamental concern of dirty log collection time scaling linearly w.r.t. > the size to memory remains. Write-protection helps spread the cost of > collecting dirty state out across all the vCPU threads. > > There could be some value in giving userspace the ability to parallelize > calls to dirty log ioctls to work on non-intersecting intervals. > > > > Slow dirty log collection is going to matter a lot for VM blackout, > > > which from experience tends to be the most sensitive period of live > > > migration for guest workloads. > > > > > > At least in our testing, the split GET/CLEAR dirty log ioctls > > > dramatically improved the performance of a write-protection based ditry > > > tracking scheme, as the false positive rate for dirtied pages is > > > significantly reduced. FWIW, this is what we use for doing LM on arm64 > as > > > opposed to the D-bit implemenation that we use on x86. > > > > Guess, by D-bit on x86 you mean the PML feature. Unfortunately that is > > something we lack on ARM yet. > > Sorry, this was rather nonspecific. I was describing the pre-copy > strategies we're using at Google (out of tree). We're carrying patches > to use EPT D-bit for exitless dirty tracking. Just curious, how does it handle the overheads associated with scanning for dirty pages and the convergence w.r.t high rate of dirtying in exitless mode? > > > Faster pre-copy performance would help the benchmark complete faster, > > > but the goal for a live migration should be to minimize the lost > > > computation for the entire operation. You'd need to test with a > > > continuous workload rather than one with a finite amount of work. > > > > Ok. Though the above is not representative of a real workload, I thought > > it gives some idea on how "Guest up time improvement" is benefitting the > > overall availability of the workload during migration. I will check within our > > wider team to see if I can setup a more suitable test/workload to show > some > > improvement with this approach. > > > > Please let me know if there is a specific workload you have in mind. > > No objection to the workload you've chosen, I'm more concerned about the > benchmark finishing before live migration completes. > > What I'm looking for is something like this: > > - Calculate the ops/sec your benchmark completes in steady state > > - Do a live migration and sample the rate throughout the benchmark, > accounting for VM blackout time > > - Calculate the area under the curve of: > > y = steady_state_rate - live_migration_rate(t) > > - Compare the area under the curve for write-protection and your DBM > approach. Ok. Got it. > > Thanks for getting back on this. Appreciate if you can do a quick glance > > through the rest of the patches as well for any gross errors especially with > > respect to page table walk locking, usage of DBM FLAGS etc. > > I'll give it a read when I have some spare cycles. To be entirely clear, > I don't have any fundamental objections to using DBM for dirty tracking. > I just want to make sure that all alternatives have been considered > in the current scheme before we seriously consider a new approach with > its own set of tradeoffs. Thanks for taking a look. Shameer