On 06.08.2018 13:59, David Hildenbrand wrote: > On 06.08.2018 13:50, Paolo Bonzini wrote: >> On 06/08/2018 12:17, David Hildenbrand wrote: >>> Hi, >>> >>> We had an internal discussion and some Daniel (cc) wondered if we should >>> drop the hpage module parameter and instead glue this to the nested >>> parameter. >>> >>> E.g. nested=1 -> hpage cannot be enabled for a VM >>> nested=0 -> hpage can be enabled for a VM >>> >>> Are we ready to expose this feature as default to all VMs? Opinions? >>> >>> This means that nested=0 (default) environments will get hpage support >>> and hpage support cannot be disabled by an admin. >>> >>> Benefit is that necessary setup to use huge pages is limited. >>> Downside is, that this is somewhat hidden behind another parameter and >>> cannot be disabled. >> >> Regarding nested I agree with Daniel. However, until dirty page logging >> works at 4kb granularity (by the way---is it the actual KVM dirty page >> logging, or storage keys, or both?), I think it's best to keep the >> module parameter. > > storage keys are right now not dirty tracked (there is no iterator model > for it in QEMU yet, but there were plans to support it - we could use > ordinary dirty tracking for it - pages that are marked dirty either have > dirty page content or dirty storage keys - but it would require some > changes). An iterative approach to skey retrieval, and hence skey dirty tracking, would only gain us something for really big guests that use keys excessively, right? That's currently not a scenario we optimize for, as Linux dropped skey usage a while ago and is the only OS we run as a KVM VM. > > So it KVM dirty page logging that is done on a 1MB basis for now. To > track pages dirty on 4k granularity, we'll have to create fake page > tables ("split") just like x86 for huge pages and write-protect all PMD > entries when dirty tracking is enabled (via memslot). Also, these fake > page tables will be required to get proper nested virtualization support > running. > > We decided to postpone this complexity and get the basic running and > upstream first. > > I would also vote for the parameter until we are sure that everything is > working as expected (4k dirty tracking, vsie support, some more testing ...) For now the parameter will stay until we fix all of that. The previous mm code was very optimized for 4k and PGSTEs and even there I found mistakes, so I don't want a user to be easily able to run a hp VM without opting in first. I might need to extend the documentation a bit, to list all peculiarities of the current implementation.
Attachment:
signature.asc
Description: OpenPGP digital signature