On 09/01/2018 04:28 AM, Fengguang Wu wrote: > (2) would need fundemental changes to the interface. It seems existing solutions > for sparse files like SEEK_HOLE/SEEK_DATA and FIEMAP ioctl may not serve this > situation well. The most efficient way could be to fill user space read() > buffer with an array of small extents: I've only been doing kernel development a few short years, but I've learned that designing user/kernel interfaces is hard. A comment in an RFC saying that we need "fundamental changes to the interface" seems to be more of a cry for help than a request for comment. This basically says to me: ignore the interface, it's broken. > This borrows host page table walk macros/functions to do EPT walk. > So it depends on them using the same level. Have you actually run this code? How does this work? It's walking the 'ept_root' that appears to be a guest CR3 register value. It doesn't appear to be the host CR3 value of the qemu (or other hypervisor). I'm also woefully confused why you are calling these EPT walks and then walking the x86-style page tables. EPT tables don't have the same format as x86 page tables, plus they don't start at a CR3-provided value. I'm also rather unsure that when running a VM, *any* host-format page tables get updated A/D bits. You need a host vaddr to do a host-format page table walk in the host page tables, and the EPT tables do direct guest physical to host physical translation. There's no host vaddr involved at all in those translations. > + if (!ept_pte_present(*pte) || > + !ept_pte_accessed(*pte)) > + idle = 1; Huh? So, !Present and idle are treated the same? If you had large swaths of !Present memory, you would see that in this interface and say, "gee, I've got a lot of idle memory to migrate" and then go do a bunch of calls to migrate it? That seems terminally wasteful. Who is going to use this? Do you have an example, at least dummy app? Can more than one app read this data at the same time? Who manages it? Who owns it? Are multiple reads destructive? This entire set is almost entirely comment-free except for the occasional C++ comments. That doesn't inspire confidence.