On 05.02.20 11:25, Michael S. Tsirkin wrote: > On Wed, Feb 05, 2020 at 10:58:14AM +0100, David Hildenbrand wrote: >> On 05.02.20 10:49, Wang, Wei W wrote: >>> On Wednesday, February 5, 2020 5:37 PM, David Hildenbrand wrote: >>>>> >>>>> Not sure how TCG tracks the dirty bits. But In whatever >>>>> implementation, the hypervisor should have >>>> >>>> There is only a single bitmap for that purpose. (well, the one where KVM >>>> syncs to) >>>> >>>>> already dealt with the race between he current round and the previous >>>> round dirty recording. >>>>> (the race isn't brought by this feature essentially) >>>> >>>> It is guaranteed to work reliably without this feature as you only clear what >>>> *has been migrated*, >>> >>> Not "clear what has been migrated" (that skips nothing..) >>> Anyway, it's a hint used for optimization. >> >> Yes, an optimization that might easily lead to data corruption when the >> two bitmaps are either not in place or don't play along in that specific >> way (and I suspect this is the case under TCG). > > So I checked and TCG has two copies too. > Each block has bmap used for migration and also dirty_memory > where pages are marked dirty. See cpu_physical_memory_sync_dirty_bitmap. qemu_guest_free_page_hint() works on block->bmap. cpu_physical_memory_set_dirty_range() works on ram_list.dirty_memory[i]. So you are right - sorry for the false alarm and thanks for verifying :) [...] > > Again a flag that tells guest it should wait until used > could be a reasonable expension. If we stick to the shrinker > it's actually implementable easily. With an OOM notifier - I'm not so > sure ... See my other mail. I think we should keep handling just as is and not overcomplicate things (especially in our implementation as you noted) Instead, maybe abstract the reporting feature. > > And a big part of the problem is that after all this time the page > hinting interfaces are still undocumented. Quite sad really :( Yes, that was the source of my confusion ... the double-bitmap thingy is non-obvious. And anybody who wants to implement that interface in a hypervisor has to be aware that the race I explained has to be avoided using e.g., two bitmaps and the sync. -- Thanks, David / dhildenb