Re: [PATCH v6 06/12] s390/mm: Add huge page dirty sync support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13.07.2018 10:16, David Hildenbrand wrote:
> On 13.07.2018 08:36, Janosch Frank wrote:
>> From: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx>
>>
>> To do dirty loging with huge pages, we protect huge pmds in the
>> gmap. When they are written to, we unprotect them and mark them dirty.
>>
>> We introduce the function gmap_test_and_clear_dirty_segment which
>> handles dirty sync for huge pages.
>>
>> Signed-off-by: Janosch Frank <frankja@xxxxxxxxxxxxx>
>> ---
>>  arch/s390/include/asm/gmap.h |   3 ++
>>  arch/s390/kvm/kvm-s390.c     |  19 ++++---
>>  arch/s390/mm/gmap.c          | 119 ++++++++++++++++++++++++++++++++++++++++++-
>>  arch/s390/mm/pgtable.c       |   6 ---
>>  4 files changed, 132 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/s390/include/asm/gmap.h b/arch/s390/include/asm/gmap.h
>> index 276268b48aff..f923ed27ac6e 100644
>> --- a/arch/s390/include/asm/gmap.h
>> +++ b/arch/s390/include/asm/gmap.h
>> @@ -15,6 +15,7 @@
>>  
>>  /* Status bits only for huge segment entries */
>>  #define _SEGMENT_ENTRY_GMAP_IN		0x8000	/* invalidation notify bit */
>> +#define _SEGMENT_ENTRY_GMAP_UC		0x4000	/* user dirty (migration) */
>>  
>>  /**
>>   * struct gmap_struct - guest address space
>> @@ -139,4 +140,6 @@ void gmap_pte_notify(struct mm_struct *, unsigned long addr, pte_t *,
>>  int gmap_mprotect_notify(struct gmap *, unsigned long start,
>>  			 unsigned long len, int prot);
>>  
>> +void gmap_sync_dirty_log_pmd(struct gmap *gmap, unsigned long dirty_bitmap[4],
>> +			     unsigned long gaddr, unsigned long vmaddr);
>>  #endif /* _ASM_S390_GMAP_H */
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 3b7a5151b6a5..6acc46cc7f7f 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -511,19 +511,24 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>  }
>>  
>>  static void kvm_s390_sync_dirty_log(struct kvm *kvm,
>> -					struct kvm_memory_slot *memslot)
>> +				    struct kvm_memory_slot *memslot)
>>  {
>>  	gfn_t cur_gfn, last_gfn;
>> -	unsigned long address;
>> +	unsigned long gaddr, vmaddr;
>> +	unsigned long *dirty = memslot->dirty_bitmap;
>>  	struct gmap *gmap = kvm->arch.gmap;
>>  
>> -	/* Loop over all guest pages */
>> +	/* Loop over all guest segments */
>> +	cur_gfn = memslot->base_gfn;
>>  	last_gfn = memslot->base_gfn + memslot->npages;
>> -	for (cur_gfn = memslot->base_gfn; cur_gfn <= last_gfn; cur_gfn++) {
>> -		address = gfn_to_hva_memslot(memslot, cur_gfn);
>> +	for (; cur_gfn <= last_gfn; cur_gfn += _PAGE_ENTRIES, dirty += 4) {
>> +		gaddr = gfn_to_gpa(cur_gfn);
>> +		vmaddr = gfn_to_hva_memslot(memslot, cur_gfn);
>> +		if (kvm_is_error_hva(vmaddr))
>> +			continue;
>> +
>> +		gmap_sync_dirty_log_pmd(gmap, dirty, gaddr, vmaddr);
> 
> I am not a friend of this interface. If ever somebody decides to change
> anything in mark_page_dirty(), this will most probably be missed. You're
> basically copying the logic we have in that function because you're not
> allowed to pass "struct kvm" itself.
> 
> Isn't there a way to call mark_page_dirty() from here with the new
> interface? E.g. let gmap_sync_dirty_log_pmd() indicate whether a single
> page or the whole segment is dirty. You can then advance either
> _PAGE_ENTRIES or only a single page.
> 
> OR (which would be performance wise better):
> 
> Create your own temporary zeroed bitmap (#_PAGE_ENTRIES bits - 256bit ==
> 4 * sizeof(unsigned long)) and pass that instead. Then go over all bits
> and call mark_page_dirty(). At least that looks cleaner to me (no need
> to fiddle with bitmaps we don't control - e.g. not having to use
> set_bit_le() or having to check if the bitmap pointer is actually valid).

I'll need to think about that one fore some time.

> 
>>  
>> -		if (test_and_clear_guest_dirty(gmap->mm, address))
>> -			mark_page_dirty(kvm, cur_gfn);
>>  		if (fatal_signal_pending(current))
>>  			return;
>>  		cond_resched();
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index a6d499d2b24b..90e2d2f0e298 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -15,6 +15,7 @@
>>  #include <linux/swapops.h>
>>  #include <linux/ksm.h>
>>  #include <linux/mman.h>
>> +#include <linux/hugetlb.h>
>>  
>>  #include <asm/pgtable.h>
>>  #include <asm/pgalloc.h>
>> @@ -521,6 +522,9 @@ void gmap_unlink(struct mm_struct *mm, unsigned long *table,
>>  	rcu_read_unlock();
>>  }
>>  
>> +static void gmap_pmdp_xchg(struct gmap *gmap, pmd_t *old, pmd_t new,
>> +			   unsigned long gaddr);
>> +
>>  /**
>>   * gmap_link - set up shadow page tables to connect a host to a guest address
>>   * @gmap: pointer to guest mapping meta data structure
>> @@ -541,6 +545,7 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>>  	p4d_t *p4d;
>>  	pud_t *pud;
>>  	pmd_t *pmd;
>> +	pmd_t unprot;
>>  	int rc;
>>  
>>  	BUG_ON(gmap_is_shadow(gmap));
>> @@ -598,12 +603,19 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>>  				       vmaddr >> PMD_SHIFT, table);
>>  		if (!rc) {
>>  			if (pmd_large(*pmd)) {
>> -				*table = pmd_val(*pmd) &
>> -					_SEGMENT_ENTRY_HARDWARE_BITS_LARGE;
>> +				*table = (pmd_val(*pmd) &
>> +					  _SEGMENT_ENTRY_HARDWARE_BITS_LARGE)
>> +					| _SEGMENT_ENTRY_GMAP_UC;
>>  			} else
>>  				*table = pmd_val(*pmd) &
>>  					_SEGMENT_ENTRY_HARDWARE_BITS;
>>  		}
>> +	} else if (*table & _SEGMENT_ENTRY_PROTECT &&
>> +		   !(pmd_val(*pmd) & _SEGMENT_ENTRY_PROTECT)) {
>> +		unprot = __pmd((*table & (_SEGMENT_ENTRY_HARDWARE_BITS_LARGE
>> +					  & ~_SEGMENT_ENTRY_PROTECT))
>> +			       | _SEGMENT_ENTRY_GMAP_UC);
> 
> Maybe split this into several lines to make this more readable.
> 
> uint64_t unprot;
> 
> unprot = *table & _SEGMENT_ENTRY_HARDWARE_BITS_LARGE;
> unprot &= ~_SEGMENT_ENTRY_PROTECT;
> unprot |= _SEGMENT_ENTRY_GMAP_UC;
> 
> ... __pmd(unprot) ...

Sure

> 
> Do we have to check here some place if the original pmd is actually
> allowed to write? (e.g. what about read-only mappings ?) Or is that
> handled by the fault handler already?

Huh, doesn't the if do that?

> 
>> +		gmap_pmdp_xchg(gmap, (pmd_t *)table, unprot, gaddr);
> 
> Also, I wonder if gmap_pmdp_xchg should take care of cleaning notifier
> bits from the new pmd value itself? It calls all notifiers and new
> notifiers can only be registered by protecting again. So masking of
> _SEGMENT_ENTRY_HARDWARE_BITS_LARGE is not really needed if done in
> gmap_pmdp_xchg.

We also have the UC bit, which currently also breaks the WARN_ON in the
idte functions...

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux