On Tue, Jun 06, 2023, Ackerley Tng wrote: > > I've ported selftests from Chao and I [1] while working on hugetlb support > for guest_mem [2]. > > In the process, I found some bugs and have some suggestions for guest_mem. > Please see separate commits at [3]. > > Here are some highlights/questions: > > + "KVM: guest_mem: Explain the use of the uptodate flag for gmem" > + Generally, uptodate flags means that the contents of this page match the > backing store. Since gmem is memory-backed, does "uptodate" for gmem mean > "zeroed"? Don't read too much into the code, my POC was very much a "beat on it until it works" scenario. > + "KVM: guest_mem: Don't re-mark accessed after getting a folio" and "KVM: > guest_mem: Don't set dirty flag for folio" > + Do we need to folio_mark_accessed(), when it was created with > FGP_ACCESSED? Probably not. And as you note below, it's all pretty nonsensical anyways. > + What is the significance of these LRU flags when gmem doesn't support > swapping/eviction? Likely none. I used the filemap APIs in my POC because it was easy, not because it was necessarily the best approach, i.e. that the folios/pages show up in the LRUs is an unwanted side effect, not a feature. If guest_memfd only needs a small subset of the filemap support, going with a true from-scratch implemenation on top of xarray might be cleaner overall, e.g. would avoid the need for a new flag to say "this folio can't be migrated even though it's on the LRUs". > + "KVM: guest_mem: Align so that at least 1 page is allocated" > + Bug in current implementation: without this alignment, fallocate() of > a size less than the gmem page size will result in no allocation at all I'm not convinced this is a bug. I don't see any reason to allow allocating and punching holes in sub-page granularity. > + Both shmem and hugetlbfs perform this alignment > + "KVM: guest_mem: Add alignment checks" > + Implemented the alignment checks for guest_mem because hugetlb on gmem > would hit a BUG_ON without this check > + "KVM: guest_mem: Prevent overflows in kvm_gmem_invalidate_begin()" > + Sean fixed a bug in the offset-to-gfn conversion in > kvm_gmem_invalidate_begin() earlier, adding a WARN_ON_ONCE() As Mike pointed out, there's likely still a bug here[*]. I was planning on diving into that last week, but that never happened. If you or anyone else can take a peek and/or write a testcase, that would be awesome. : Heh, only if there's a testcase for it. Assuming start >= the slot offset does : seem broken, e.g. if the range-to-invalidate overlaps multiple slots, later slots : will have index==slot->gmem.index > start. : : > Since 'index' corresponds to the gmem offset of the current slot, is there any : > reason not to do something like this?: : > : > .start = slot->base_gfn + index - slot->gmem.index, : > : > But then, if that's the case, wouldn't index == slot->gmem.index? Suggesting : > we case just simplify to this?: : > : > .start = slot->base_gfn, : : No, e.g. if start is partway through a memslot, there's no need to invalidate : the entire memslot. I'll stare at this tomorrow when my brain is hopefully a : bit more functional, I suspect there is a min() and/or max() needed somewhere. [*] https://lore.kernel.org/all/20230512002124.3sap3kzxpegwj3n2@xxxxxxx > + Code will always hit WARN_ON_ONCE() when the entire file is closed and > all offsets are invalidated, so WARN_ON_ONCE() should be removed > + Vishal noticed that the conversion might result in an overflow, so I > fixed that > + And of course, hugetlb support! Please let me know what you think of the > approach proposed at [2]. > > [1] https://lore.kernel.org/all/cover.1678926164.git.ackerleytng@xxxxxxxxxx/T/ > [2] https://lore.kernel.org/lkml/cover.1686077275.git.ackerleytng@xxxxxxxxxx/T/ > [3] https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1