On 27.06.2018 15:55, Janosch Frank wrote: > From: Janosch Frank <frankja@xxxxxxxxxxxxxxxxxx> > > To do dirty loging with huge pages, we protect huge pmds in the > gmap. When they are written to, we unprotect them and mark them dirty. > > We introduce the function gmap_test_and_clear_dirty_segment which > handles dirty sync for huge pages. Split pmds need their own handling > functions in pgtable.c as they need the pgste lock functions that are > not available in gmap.c. This is complicated stuff, it makes my head hurt :) So, we have to guarantee (unfortunately too) many things 1. If userspace writes to a page, we have to mark it dirty in the gmap (otherwise it will break migration - this is e.g. what pgste_set_pte() does as of now). So if a userspace pmd entry is unproteced, the gmap pmd entry should be marked dirty. (and should be unprotected!) 2. If a page in the userspace page table is changed (e.g. protected or unprotect(basically 1), also via fixup_user_pages()), we have to fixup the gmap tables accordingly. 2.1 4k pages: This is implicitly handled by having shared page tables. 2.1 Huge pages: If a pmd entry is changed, we have to change the pmd entry in the gmap. 2.2 Huge pages: If a pmd entry is changed, we have to change the split pmd entries in the gmap (when protecting, it sufficient to change the pmd entry in the gmap. We only have to "unprotect" previously protected split 4k pages, but that would be easy). Some challenging questions: a) Why can't we perform protection for dirty tracking using the regular user space tables. The code modifying user space page tables should make sure that the proper gmap entries are invalidated/changed. This is basically the same as we have right now, see test_and_clear_guest_dirty(). Right now the easy part is that we share the same page tables for 4k pages, so we get that "protecting 4k page tables in both page tables" for free in one shot. But in general we could always go via the user space table. This would be a clean "modify userspace tables -> implicitly modify gmap tables" approach. You are suggesting it the other way around: Modify (e.g. protect) the GMAP and than transfer protection to the user space page tables. If I am not wrong this might even be bad if we would have multiple GMAPs per VM (which is possible as far as I can see in the code!also, hello vSIE but it might not be applicable), other GMAPs would as of now not get properly restricted access rights. This might work for now but highlights the problem that I see. It's not a clean "control flow direction". b) I am even starting to wonder if dirty tracking should be done on the user space page tables instead of the GMAP. This makes a lot of sense, especially when thinking about multiple GMAPs for a process. And especially the existing interface is built like this: test_and_clear_guest_dirty(struct mm_struct *mm, unsigned long addr); -> mm struct, not a gmap -> hva adddress, not a gmap address You're modifying it to include a specific gmap. I assume finding a spare bit for this is the tricky part. c) Why do we have to handle split PMDs in a special way? Can't we just perform dirty tracking + (dirty tracking) protection on PMD entry level? So "treat them like a PMD unless subpage protection is required for e.g. HW or vSIE". The only special case is might be allowing to clear protection from split page in case the PMD entry is not protected. Pages have to be migrated using huge pages anyway, so dirty protection could happen on PMD basis. But this part is really tricky and not well thought through from my side :) There are a lot of details to consider. In an ideal world, the split PMD page tables would not even have PGSTE - like gmap shadow page tables. They are not used by HW either way. We would only need a bit for IN and VSIE handling. But I guess that is the problem. -- Thanks, David / dhildenb