Hi Thomas, > that the access() handler took a shortcut when the new locking order > was established There is no new locking order, the access handler is just for debugging and ignoring the correct locking order between mmap_sem and bo_reserve. That this is throwing a lockdep warning is perfectly possible. We should probably move that to a trylock instead. > bo_reserve() > copy_to_user() / copy_from_user() > bo_unreserve() That one is illegal for a completely different reason. The address accessed by copy_to_user()/copy_from_user() could be a BO itself, so to resolve this we could end up locking a BO twice. Adding a might_lock() to the beginning of ttm_bo_vm_fault as you suggested doesn't work either, because at this point the mmap_sem is still locked. So lockdep would complain about the incorrect bo_reserve and mmap_sem order. Christian. Am 13.10.2018 um 21:04 schrieb Thomas Hellstrom: > Hi, Christian, > > On 10/13/2018 07:36 PM, Christian König wrote: >> Hi Thomas, >> >>> bo_reserve() >>> copy_to_user() / copy_from_user() >>> bo_unreserve() >> >> That pattern is illegal for a number of reasons and the mmap_sem is >> only one of it. >> >> So the locking order must always be mmap_sem->bo_reservation. See the >> userptr implementation in amdgpu as well. >> >> Christian. > > I'm not arguing against that, and since vmwgfx doesn't use that > pattern, the locking order doesn't really matter to me since it's even > possible to make the TTM fault() handler more well-behaved if we were > to fix the locking order to mmap_sem->bo_reserve. > > My concern is, since the _opposite_ locking order is (admittedly > vaguely) documented in the fault handler, that the access() handler > took a shortcut when the new locking order was established possibly > without auditing of the other TTM drivers for locking inversion: For > example it looks from a quick glance like > nouveau_gem_pushbuf_reloc_apply() calls copy_from_user() with bo's > reserved (which IIRC was the typical use-case at the time this was > last lifted). And lockdep won't trip unless the access() callback is > actually called. > > My point is if AMD wants to enforce this locking order, then IMHO the > other drivers need to be audited and corrected if they are assuming > the locking order documented in fault(). A good way to catch such > drivers would be to add that might_lock(). > > Thanks, > Thomas > > >> >> Am 12.10.2018 um 16:52 schrieb Thomas Hellstrom: >>> Hi, Felix, >>> >>> It looks like there is a locking inversion in ttm_bo_vm_access() >>> where we take a sleeping bo_reserve() while holding mmap_sem(). >>> >>> Previously we've been assuming the other way around or at least >>> undefined allowing for drivers to do >>> >>> bo_reserve() >>> copy_to_user() / copy_from_user() >>> bo_unreserve() >>> >>> I'm not sure the latter pattern is used in any drivers, though, and >>> I guess there are ways around it. So it might make sense to fix the >>> locking order at this point. In that case, perhaps one should add a >>> >>> might_lock(&bo->resv->lock.base); >>> >>> at the start of the TTM fault handler to trip lockdep on locking >>> order violations in situations where the access() callback isn't >>> commonly used... >>> >>> /Thomas >>> >>> >>> >>> >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@xxxxxxxxxxxxxxxxxxxxx >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel