Re: [PATCH v4 5/9] s390/mm: Add huge page dirty sync support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.06.2018 15:55, Janosch Frank wrote:
> From: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx>
> 
> To do dirty loging with huge pages, we protect huge pmds in the
> gmap. When they are written to, we unprotect them and mark them dirty.
> 
> We introduce the function gmap_test_and_clear_dirty_segment which
> handles dirty sync for huge pages. Split pmds need their own handling
> functions in pgtable.c as they need the pgste lock functions that are
> not available in gmap.c.

This is complicated stuff, it makes my head hurt :)

So, we have to guarantee (unfortunately too) many things

1. If userspace writes to a page, we have to mark it dirty in the gmap
(otherwise it will break migration - this is e.g. what pgste_set_pte()
does as of now). So if a userspace pmd entry is unproteced, the gmap pmd
entry should be marked dirty. (and should be unprotected!)

2. If a page in the userspace page table is changed (e.g. protected or
unprotect(basically 1), also via fixup_user_pages()), we have to fixup
the gmap tables accordingly.

2.1 4k pages: This is implicitly handled by having shared page tables.

2.1 Huge pages: If a pmd entry is changed, we have to change the pmd
entry in the gmap.

2.2  Huge pages: If a pmd entry is changed, we have to change the split
pmd entries in the gmap (when protecting, it sufficient to change the
pmd entry in the gmap. We only have to "unprotect" previously protected
split 4k pages, but that would be easy).

Some challenging questions:

a) Why can't we perform protection for dirty tracking using the regular
user space tables. The code modifying user space page tables should make
sure that the proper gmap entries are invalidated/changed. This is
basically the same as we have right now, see
test_and_clear_guest_dirty(). Right now the easy part is that we share
the same page tables for 4k pages, so we get that "protecting 4k page
tables in both page tables" for free in one shot. But in general we
could always go via the user space table.

This would be a clean "modify userspace tables -> implicitly modify gmap
tables" approach.

You are suggesting it the other way around: Modify (e.g. protect) the
GMAP and than transfer protection to the user space page tables. If I am
not wrong this might even be bad if we would have multiple GMAPs per VM
(which is possible as far as I can see in the code!also, hello vSIE but
it might not be applicable), other GMAPs would as of now not get
properly restricted access rights. This might work for now but
highlights the problem that I see. It's not a clean "control flow
direction".

b) I am even starting to wonder if dirty tracking should be done on the
user space page tables instead of the GMAP. This makes a lot of sense,
especially when thinking about multiple GMAPs for a process. And
especially the existing interface is built like this:

test_and_clear_guest_dirty(struct mm_struct *mm, unsigned long addr);

-> mm struct, not a gmap
-> hva adddress, not a gmap address

You're modifying it to include a specific gmap.

I assume finding a spare bit for this is the tricky part.


c) Why do we have to handle split PMDs in a special way? Can't we just
perform dirty tracking + (dirty tracking) protection on PMD entry level?
So "treat them like a PMD unless subpage protection is required for e.g.
HW or vSIE". The only special case is might be allowing to clear
protection from split page in case the PMD entry is not protected. Pages
have to be migrated using huge pages anyway, so dirty protection could
happen on PMD basis. But this part is really tricky and not well thought
through from my side :) There are a lot of details to consider.

In an ideal world, the split PMD page tables would not even have PGSTE -
like gmap shadow page tables. They are not used by HW either way. We
would only need a bit for IN and VSIE handling. But I guess that is the
problem.


-- 

Thanks,

David / dhildenb



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux