Re: [PATCH v7 0/5] x86: Fix SEV guest regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 09/13/2018 11:22 AM, Thomas Gleixner wrote:
On Mon, 10 Sep 2018, Brijesh Singh wrote:
x86/kvmclock: Remove memblock dependency
introduced SEV guest regression.

The guest physical address holding the wall_clock and hv_clock_boot
are shared with the hypervisor must be mapped with C=0 when SEV
is active. To clear the C-bit we use  kernel_physical_mapping_init() to
split the large pages. The above commit moved the kvmclock initialization
very early and kernel_physical_mapping_init() fails to allocate memory
while spliting the large page.

To solve it, we add a special .data..decrypted section, this section can be
used to hold the shared variables. Early boot code maps this section with
C=0. The section is pmd aligned and sized to avoid the need to split the pages.
Caller can use __decrypted attribute to add the variables in .data..decrypted
section.

I went through this patch series and to be honest this is in no way -rc3+
material.

I really have to ask why this whole change to sme_encrypt_kernel() is
required at all.


The main reason behind making the changes in sme_encrypt_kernel() was
to perverse the initialized values (if any). Since we are adding
 __decrypted attribute hence I thought it would make sense to support
the usecase where in future someone may use this attribute on initialized variables.

Now the question is, do we really need this to fix the regression? the
answer is NO. Since there is only one user of this new attribute and
lucky it does not initialize the variables hence it is safe to move the
variable in .bss..decrypted section and memset(0).


Lets look at the problem at hand:

   We need exactly TWO pages to be shareable at early boot when the
   kernel runs in a guest and wants to utilize kvmclock.

   No, we do not need the whole NR_CPUS sized thing at that point at
   all. The secondary CPUs are brought up way later and there is absolutely
   no point in having the ugly decrypted..aux hack. That conditional freeing
   is really horrible and prone to break sooner than later.

   Further the TWO pages are BSS pages which does not require to handle them
   special other than fixing up the PMD and doing a memset(0) on the
   actually utilized TWO pages.

We really can enforce that such decrypted variables have to be in the BSS
for now and revisit this when there is an absolute requirement to have
statically initialized ones, which I doubt we have.

So the way simpler solution is:

1) Add a .bss..decrypted section which is PMD aligned and mark the
    kvmclock hv_clock_boot and wallclock struct with __bss_decrypted.
 > 2) Fixup the .bss..decrypted section PMD in the SEV case at the end of
    sme_encrypt_kernel() and do a memset(0) on that. That does not require
    any of the restructuring, really.


Somewhere during the discussion, I was asked to make sure that
 __decrypted attribute can be used by others in future and don't tie it
with just the kvmclock.

To fix the regression we don't need to have this complexity. I am okay
to implement your proposal. I would like to fix this regression sooner
than later ;)


3) Have a function which is called after the page allocator is up which
    does:


I am glad you are pointing this one. In my initial patch I was doing a
kmalloc() of page-size, during review we did discussed to do allocation
once using num_possible_cpus()[like what you have proposed]. But then
discussion moved towards saving memory and that when the secondary array
concept came into the picture. Since .data..decrypted has some unused memory hence we were getting creative in reusing it.



static struct pvclock_vsyscall_time_info *hvclock_mem;

void __init kvmclock_init_mem(void)
{
	unsigned int ncpus = num_possible_cpus() - HVC_BOOT_ARRAY_SIZE;
	unsigned int order = get_order(ncpus * sizeof(*hvclock_mem));
	struct page *p;

	p = alloc_pages(order, GFP_KERNEL);
	if (p) {
		hvclock_mem = page_address(p);
		set_memory_decrypted((unsigned long) hvclock_mem,
				     PAGE_SIZE << order);
	}
}

4) In kvmclock_setup_percpu() do the following:

         if (cpu < HVC_BOOT_ARRAY_SIZE)
                 p = &hv_clock_boot[cpu];
	else if (hvclock_mem)
		p = hvclock_mem + cpu - HVC_BOOT_ARRAY_SIZE;
	else
		return -ENOMEM;

5) Unconditionally free the pages after the used .bbs..decrypted variables,
    so the unused rest of the 2M page is not lost.

That should be a halfways slim sized and non instrusive changeset.


I am okay to implement and test your recommendation, if anyone
disagree then please let me know. thanks




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux