Re: vmwgfx leaking bo pins?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 3/15/21 9:38 PM, Daniel Vetter wrote:
On Mon, Mar 15, 2021 at 6:57 PM Zack Rusin <zackr@xxxxxxxxxx> wrote:
On 3/12/21 5:06 AM, Thomas Hellström (Intel) wrote:
On 3/12/21 12:02 AM, Zack Rusin wrote:
On Mar 11, 2021, at 17:35, Thomas Hellström (Intel)
<thomas_os@xxxxxxxxxxxx> wrote:

Hi, Zack

On 3/11/21 10:07 PM, Zack Rusin wrote:
On Mar 11, 2021, at 05:46, Thomas Hellström (Intel)
<thomas_os@xxxxxxxxxxxx> wrote:

Hi,

I tried latest drm-fixes today and saw a lot of these: Fallout from
ttm rework?
Yes, I fixed this in d1a73c641afd2617bd80bce8b71a096fc5b74b7e it was
in drm-misc-next in the drm-misc tree for a while but hasn’t been
merged for 5.12.

z

Hmm, yes but doesn't that fix trip the ttm_bo_unpin()
dma_resv_assert_held(bo->base.resv)?
No, doesn’t seem to. TBH I’m not sure why myself, but it seems to be
working fine.


With CONFIG_PROVE_LOCKING=y I see this:

[    7.117145] [drm] FIFO at 0x00000000fe000000 size is 8192 kiB
[    7.117284] [drm] VRAM at 0x00000000e8000000 size is 131072 kiB
[    7.117291] INFO: trying to register non-static key.
[    7.117295] the code is fine but needs lockdep annotation.
[    7.117298] turning off the locking correctness validator

Which will probably mask that dma_resv_assert_held(bo->base.resv)

Ah, yes, you're right. After fixing that I do see the
dma_resv_assert_held triggered. Technically trivially fixable with
ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but it's a little
ugly that some places e.g. vmw_bo_unreference will require
ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but some won't e.g.
vmw_mob_destroy won't because its lock is held by ttm_bo_delayed_delete
without a very clear indication within the function which is which.

It looks like, like Daniel hints below, for the mob pagetable bos since they are pinned and hence not on a LRU list, the parent bo is holding the only reference, which is utilized in vmw_mob_unbind() to make sure the tryreserve always succeeds. (unpin could be called in vmw_mob_unbind for the pagetable bo just after fencing if necessary), and similarly for the other vmwgfx_mob error paths, but in that case one should probably keep the bo pointers in stack variables until you know you don't have to unpin. Then it should be ok to tryreserve around unpinning in the error paths.

But it's constructs like that, that really makes me think we shouldn't need to reserve to unpin.

Not sure it applies here, but if the refcount is down to 0 we know
we're in destruction code and can't race with anything anymore, so
maybe we can lift the debug check.

Otoh I think at that point we might still be on lru lists, so the
rules become rather tricky whether it's really always legal to skip
the dma_resv_lock. But we could perhaps figure out something if it's
too annoying to have a consistent calling context in drivers.

I'm not a huge fan of dropping the requirement from unpin and
switching to atomic_t for the pin count, since pin/unpin is an
extremely slow path, adding complexity in how we protect stuff for a
function that's maybe called 60/s (for page flipping we pin/unpin)
doesn't strike me as the right balance.

*If* we can protect the bo LRU state with the lru lock instead of with reservation it shuld probably only be a matter of extending the lru lock critical section over a couple of assignments. If we change the bo lru state we'd need to grab the lru lock sooner or later anyway, so I think the added complexity should be minimal.

/Thomas


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux