On Thu, Aug 31, 2017 at 02:39:17PM -0400, Felix Kuehling wrote: > On 2017-08-31 09:59 AM, Jerome Glisse wrote: > > [Adding Intel folks as they might be interested in this discussion] > > > > On Wed, Aug 30, 2017 at 05:51:52PM -0400, Felix Kuehling wrote: > >> Hi Jérôme, > >> > >> I have some questions about the potential range-start-end race you > >> mentioned. > >> > >> On 2017-08-29 07:54 PM, Jérôme Glisse wrote: > >>> Note that a lot of existing user feels broken in respect to range_start/ > >>> range_end. Many user only have range_start() callback but there is nothing > >>> preventing them to undo what was invalidated in their range_start() callback > >>> after it returns but before any CPU page table update take place. > >>> > >>> The code pattern use in kvm or umem odp is an example on how to properly > >>> avoid such race. In a nutshell use some kind of sequence number and active > >>> range invalidation counter to block anything that might undo what the > >>> range_start() callback did. > >> What happens when we start monitoring an address range after > >> invaligate_range_start was called? Sounds like we have to keep track of > >> all active invalidations for just such a case, even in address ranges > >> that we don't currently care about. > >> > >> What are the things we cannot do between invalidate_range_start and > >> invalidate_range_end? amdgpu calls get_user_pages to re-validate our > >> userptr mappings after the invalidate_range_start notifier invalidated > >> it. Do we have to wait for invalidate_range_end before we can call > >> get_user_pages safely? > > Well the whole userptr bo object is somewhat broken from the start. > > You never defined the semantic of it ie what is expected. I can > > think of 2 differents semantics: > > A) a uptr buffer object is a snapshot of a memory at the time of > > uptr buffer object creation > > B) a uptr buffer object allow GPU to access a range of virtual > > address of a process an share coherent view of that range > > between CPU and GPU > > > > As it was implemented it is more inline with B but it is not defined > > anywhere AFAICT. > > Yes, we're trying to achieve B, that's why we have an MMU notifier in > the first place. But it's been a struggle getting it to work properly, > and we're still dealing with some locking issues and now this one. > > > > > Anyway getting back to your questions, it kind of doesn't matter as > > you are using GUP ie you are pinning pages except for one scenario > > (at least i can only think of one). > > > > Problematic case is race between CPU write to zero page or COW and > > GPU driver doing read only GUP: > [...] > > Thanks, I was aware of COW but not of the zero-page case. I believe in > most practical cases our userptr mappings are read-write, so this is > probably not causing us any real trouble at the moment. > > > So i would first define the semantic of uptr bo and then i would fix > > accordingly the code. Semantic A is easier to implement and you could > > just drop the whole mmu_notifier. Maybe it is better to create uptr > > buffer object everytime you want to snapshot a range of address. I > > don't think the overhead of buffer creation would matter. > > That doesn't work for KFD and our compute memory model where CPU and GPU > expect to share the same address space. > > > > > If you want to close the race for COW and zero page in case of read > > only GUP there is no other way than what KVM or ODP is doing. I had > > patchset to simplify all this but i need to bring it back to life. > > OK. I'll look at these to get an idea. > > > Note that other thing might race but as you pin the pages they do > > not matter. It just mean that if you GUP after range_start() but > > before range_end() and before CPU page table update then you pinned > > the same old page again and nothing will happen (migrate will fail, > > MADV_FREE will nop, ...). So you just did the range_start() callback > > for nothing in those cases. > > We pin the memory because the GPU wants to access it. So this is > probably OK if the migration fails. However, your statement that we > "just did the range_start() callback for nothing" implies that we could > as well have ignored the range_start callback. But I don't think that's > true. That way we'd keep a page pinned that is no longer mapped in the > CPU address space. So the GPU mapping would be out of sync with the CPU > mapping. Here is an example of "for nothing": CPU0 CPU1 > migrate page at addr A > invalidate_start addr A > unbind_ttm(for addr A) > use ttm object for addr A > GUP addr A > page table update > invalidate_end addr A > refcount check fails because of GUP > restore page table to A This is what i meant by range_start() for nothing on CPU0 you invalidated ttm object for nothing but this is because of a race. Fixing COW race would also fix this race and avoid migration to fail because GPU GUP. I was not saying you should not use mmu_notifier. For achieving B you need mmu_notifier. Note that if you do like ODP/KVM then you do not need to pin page. Cheers, Jérôme