On 11/15/19 6:06 AM, Jason Gunthorpe wrote:
On Thu, Nov 14, 2019 at 03:06:05PM -0800, Ralph Campbell wrote:
On 11/13/19 5:51 AM, Christoph Hellwig wrote:
On Tue, Nov 12, 2019 at 11:45:52PM +0000, Jason Gunthorpe wrote:
Well, it would mean registering for the whole process address space.
I'll give it a try.
I'm not sure it makes much sense that this testing is essentially
modeled after nouveau's usage which is very strange compared to the
other drivers.
Which means we really should make the test cases fit the proper usage.
Maybe defer the tests for 5.5 and just merge the first patch for now?
I think this a good point to discuss.
Some devices will want to register for all changes to the process address
space because there is no requirement to preregister regions that the
device can access verses devices like InfiniBand where a range of addresses
have to be registered before the device can access those addresses.
But this is a very bad idea to register and do HW actions for ranges
that can't possibly have any pages registered. It slows down the
entire application
I think the ODP approach might be saner, when it mirrors the entire
address space it chops it up into VA chunks, and once a page is
registered on the HW the VA chunk goes into the interval tree.
Presumably the GPU also has some kind of page table tree and you could
set one of the levels as the VA interval when there are populated children
Jason
I wasn't suggesting that HW invalidates happen in two places.
I'm suggesting the two styles of invalidates can work together.
For example, what if a driver calls mmu_notifier_register(mn, mm)
to register for address space wide invalidations, then some time
later there is a device page table fault and the driver calls
mmu_range_notifier_insert() but with a NULL ops.invalidate.
The fault handler follows the nouveau/test_hmm pattern to call
mmu_range_read_begin()
hmm_range_fault()
device lock
mmu_range_read_retry()
update device page tables
device unlock
mmu_range_notifier_remove()
The global invalidate() callback would get the device lock and
call into mm to update the sequence number of any affected ranges
instead of having a range invalidate callback, and then do the HW
invalidations.