On Tue, 06 Dec 2022 06:22:45 +0000, Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote: > > > > On 22/11/22 11:16 pm, Marc Zyngier wrote: > > On Fri, 18 Nov 2022 09:47:50 +0000, > > Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote: > >> > >> > >> > >> On 18/11/22 12:56 am, Marc Zyngier wrote: > >>> On Sun, 13 Nov 2022 17:05:06 +0000, > >>> Shivam Kumar <shivam.kumar1@xxxxxxxxxxx> wrote: > >>>> > >>>> + count: the current count of pages dirtied by the VCPU, can be > >>>> + skewed based on the size of the pages accessed by each vCPU. > >>> > >>> How can userspace make a decision on the amount of dirtying this > >>> represent if this doesn't represent a number of base pages? Or are you > >>> saying that this only counts the number of permission faults that have > >>> dirtied pages? > >> > >> Yes, this only counts the number of permission faults that have > >> dirtied pages. > > > > So how can userspace consistently set a quota of dirtied memory? This > > has to account for the size that has been faulted, because that's all > > userspace can reason about. Remember that at least on arm64, we're > > dealing with 3 different base page sizes, and many more large page > > sizes. > > I understand that this helps only when the majority of dirtying is > happening for the same page size. In our use case (VM live migration), > even large page is broken into 4k pages at first dirty. If required in > future, we can add individual counters for different page sizes. > > Thanks for pointing this out. Adding counters for different page sizes won't help. It will only make the API more complex and harder to reason about. arm64 has 3 base page sizes, up to 5 levels of translation, the ability to have block mappings at most levels, plus nice features such as contiguous hints that we treat as another block size. If you're lucky, that's about a dozen different sizes. Good luck with that. You really need to move away from your particular, 4kB centric, live migration use case. This is about throttling a vcpu based on how much memory it dirties. Not about the number of page faults it takes. You need to define the granularity of the counter, and account for each fault according to its mapping size. If an architecture has 16kB as the base page size, a 32MB fault (the size of the smallest block mapping) must bump the counter by 2048. That's the only way userspace can figure out what is going on. Without that, you may as well add a random number to the counter, it won't be any worse. [...] > >>> If you introduce additional #ifdefery here, why are the additional > >>> fields in the vcpu structure unconditional? > >> > >> pages_dirtied can be a useful information even if dirty quota > >> throttling is not used. So, I kept it unconditional based on > >> feedback. > > > > Useful for whom? This creates an ABI for all architectures, and this > > needs buy-in from everyone. Personally, I think it is a pretty useless > > stat. > > When we started this patch series, it was a member of the kvm_run > struct. I made this a stat based on the feedback I received from the > reviews. If you think otherwise, I can move it back to where it was. I'm certainly totally opposed to stats that don't have a clear use case. People keep piling random stats that satisfy their pet usage, and this only bloats the various structures for no overall benefit other than "hey, it might be useful". This is death by a thousand cut. > > And while we're talking about pages_dirtied, I really dislike the > > WARN_ON in mark_page_dirty_in_slot(). A counter has rolled over? > > Shock, horror... > > Ack. I'll give it a thought but if you have any specific suggestion on > how I can make it better, kindly let me know. Thanks. What is the effect of counter overflowing? Why is it important to warn? What goes wrong? What could be changed to *avoid* this being an issue? M. -- Without deviation from the norm, progress is not possible.