On 8/25/21 4:16 AM, Vlastimil Babka wrote: > On 8/24/21 18:42, Joerg Roedel wrote: >> On Mon, Aug 23, 2021 at 07:50:22AM -0700, Dave Hansen wrote: >>> It *has* to be done in KVM, IMNHO. >>> >>> The core kernel really doesn't know much about SEV. It *really* doesn't >>> know when its memory is being exposed to a virtualization architecture >>> that doesn't know how to split TLBs like every single one before it. >>> >>> This essentially *must* be done at the time that the KVM code realizes >>> that it's being asked to shove a non-splittable page mapping into the >>> SEV hardware structures. >>> >>> The only other alternative is raising a signal from the fault handler >>> when the page can't be split. That's a *LOT* nastier because it's so >>> much later in the process. >>> >>> It's either that, or figure out a way to split hugetlbfs (and DAX) >>> mappings in a failsafe way. >> >> Yes, I agree with that. KVM needs a check to disallow HugeTLB pages in >> SEV-SNP guests, at least as a temporary workaround. When HugeTLBfs >> mappings can be split into smaller pages the check can be removed. > > FTR, this is Sean's reply with concerns in v4: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-coco%2FYPCuTiNET%252FhJHqOY%40google.com%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7C692ea2e8bfd744e7ab5d08d967a918d3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637654798234874418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=leZuMY0%2FX8xbHA%2FOrxkXNoLCGNoVUQpY5eB3EInM55A%3D&reserved=0 > > I think there are two main arguments there: > - it's not KVM business to decide > - guest may do all page state changes with 2mb granularity so it might be fine > with hugetlb > > The latter might become true, but I think it's more probable that sooner > hugetlbfs will learn to split the mappings to base pages - I know people plan to > work on that. At that point qemu will have to recognize if the host kernel is > the new one that can do this splitting vs older one that can't. Preferably > without relying on kernel version number, as backports exist. Thus, trying to > register a hugetlbfs range that either is rejected (kernel can't split) or > passes (kernel can split) seems like a straightforward way. So I'm also in favor > of adding that, hopefuly temporary, check. If that's the direction taken, I think we'd be able to use a KVM_CAP_ value that can be queried by the VMM to make the determination. Thanks, Tom > > Vlastimil > >> Regards, >> >> Joerg >> >