On Wed, May 07, 2014 at 05:14:52PM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2014-05-06 at 12:18 -0400, Jerome Glisse wrote: > > > > I do understand that i was pointing out that if i move to, tlb which i > > am fine with, i will still need to sleep there. That's all i wanted to > > stress, i did not wanted force using mmu_notifier, i am fine with them > > becoming atomic as long as i have a place where i can intercept cpu > > page table update and propagate them to device mmu. > > Your MMU notifier can maintain a map of "dirty" PTEs and you do the > actual synchronization in the subsequent flush_tlb_* , you need to add > hooks there but it's much less painful than in the notifiers. Well getting back the dirty info from the GPU also require to sleep. Maybe i should explain how it is suppose to work. GPU have several command buffer and execute instructions inside those command buffer in sequential order. To update the GPU mmu you need to schedule command into one of those command buffer but when you do so you do not know how much command are in front of you and how long it will take to the GPU to get to your command. Yes GPU this patchset target have preemption but it is not as flexible as CPU preemption there is not kernel thread running and scheduling, all the scheduling is done in hardware. So the preemption is more limited that on CPU. That is why any update or information retrieval from the GPU need to go through some command buffer and no matter how high priority the command buffer for mmu update is, it can still long time (think flushing thousand of GPU thread and saving there context). > > *However* Linus, even then we can't sleep. We do things like > ptep_clear_flush() that need the PTL and have the synchronous flush > semantics. > > Sure, today we wait, possibly for a long time, with IPIs, but we do not > sleep. Jerome would have to operate within a similar context. No sleep > for you :) > > Cheers, > Ben. So for the ptep_clear_flush my idea is to have a special lru for page that are in use by the GPU. This will prevent the page reclaimation try_to_unmap and thus the ptep_clear_flush. I would block ksm so again another user that would no do ptep_clear_flush. I would need to fix remap_file_pages either adding some callback there or refactor the unmap and tlb flushing. Finaly for page migration i see several solutions, forbid it (easy for me but likely not what we want) have special code inside migrate code to handle page in use by a device, or have special code inside try_to_unmap to handle it. I think this is all the current user of ptep_clear_flush and derivative that does flush tlb while holding spinlock. Note that for special lru or event special handling of page in use by a device i need a new page flag. Would this be acceptable ? For the special lru i was thinking of doing it per device as anyway each device is unlikely to constantly address all the page it has mapped. Simple lru list would do and probably offering some helper for device driver to mark page accessed so page frequently use are not reclaim. But a global list is fine as well and simplify the case diffirent device use same pages. Cheers, Jérôme Glisse -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>