Am 2022-02-11 um 11:15 schrieb David Hildenbrand:
On 01.02.22 16:48, Alex Sierra wrote:
Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.
Signed-off-by: Alex Sierra <alex.sierra@xxxxxxx>
Acked-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>
Reviewed-by: Alistair Popple <apopple@xxxxxxxxxx>
So, I'm currently messing with PageAnon() pages and CoW semantics ...
all these PageAnon() ZONE_DEVICE variants don't necessarily make my life
easier but I'm not sure yet if they make my life harder. I hope you can
help me understand some of that stuff.
1) What are expected CoW semantics for DEVICE_COHERENT?
I assume we'll share them just like other PageAnon() pages during fork()
readable, and the first sharer writing to them receives an "ordinary"
!ZONE_DEVICE copy.
Yes.
So this would be just like DEVICE_EXCLUSIVE CoW handling I assume, just
that we don't have to go through the loop of restoring a device
exclusive entry?
I'm not sure how DEVICE_EXCLUSIVE pages are handled under CoW. As I
understand it, they're not really in a special memory zone like
DEVICE_COHERENT. Just a special way of mapping an ordinary page in order
to allow device-exclusive access for some time. I suspect there may even
be a possibility that a page can be both DEVICE_EXCLUSIVE and
DEVICE_COHERENT.
That said, your statement sounds correct. There is no requirement to do
anything with the new "ordinary" page after copying. What actually
happens to DEVICE_COHERENT pages on CoW is a bit convoluted:
When the page is marked as CoW, it is marked R/O in the CPU page table.
This causes an MMU notifier that invalidates the device PTE. The next
device access in the parent process causes a page fault. If that's a
write fault (usually is in our current driver), it will trigger CoW,
which means the parent process now gets a new system memory copy of the
page, while the child process keeps the DEVICE_COHERENT page. The driver
could decide to migrate the page back to a new DEVICE_COHERENT allocation.
In practice that means, "fork" basically causes all DEVICE_COHERENT
memory in the parent process to be migrated to ordinary system memory,
which is quite disruptive. What we have today results in correct
behaviour, but the performance is far from ideal.
We could probably mitigate it by making the driver better at mapping
pages R/O in the device on read faults, at the potential cost of having
to handle a second (write) fault later.
2) How are these pages freed to clear/invalidate PageAnon() ?
I assume for PageAnon() ZONE_DEVICE pages we'll always for via
free_devmap_managed_page(), correct?
Yes. The driver depends on the the page->pgmap->ops->page_free callback
to free the device memory allocation backing the page.
3) FOLL_PIN
While you write "no one should be allowed to pin such memory", patch #2
only blocks FOLL_LONGTERM. So I assume we allow ordinary FOLL_PIN and
you might want to be a bit more precise?
I agree. I think the paragraph was written before we fully fleshed out
the interaction with GUP, and the forgotten.
... I'm pretty sure we cannot FOLL_PIN DEVICE_PRIVATE pages,
Right. Trying to GUP a DEVICE_PRIVATE page causes a page fault that
migrates the page back to normal system memory (using the
page->pgmap->ops->migrate_to_ram callback). Then you pin the system
memory page.
but can we
FILL_PIN DEVICE_EXCLUSIVE pages? I strongly assume so?
I assume you mean DEVICE_COHERENT, not DEVICE_EXCLUSIVE? In that case
the answer is "Yes".
Regards,
Felix
Thanks for any information.