On Mon, Jan 11, 2021 at 01:31:00PM -0500, Andrey Grodzovsky wrote: > > On 1/11/21 12:41 PM, Andrey Grodzovsky wrote: > > > > On 1/11/21 11:15 AM, Daniel Vetter wrote: > > > On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote: > > > > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote: > > > > > Ok then, I guess I will proceed with the dummy pages list implementation then. > > > > > > > > > > Andrey > > > > > > > > > > ________________________________ > > > > > From: Koenig, Christian <Christian.Koenig@xxxxxxx> > > > > > Sent: 08 January 2021 09:52 > > > > > To: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Daniel > > > > > Vetter <daniel@xxxxxxxx> > > > > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > > > > <amd-gfx@xxxxxxxxxxxxxxxxxxxxx>; > > > > > dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > > > <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; daniel.vetter@xxxxxxxx > > > > > <daniel.vetter@xxxxxxxx>; robh@xxxxxxxxxx <robh@xxxxxxxxxx>; > > > > > l.stach@xxxxxxxxxxxxxx <l.stach@xxxxxxxxxxxxxx>; > > > > > yuq825@xxxxxxxxx <yuq825@xxxxxxxxx>; eric@xxxxxxxxxx > > > > > <eric@xxxxxxxxxx>; Deucher, Alexander > > > > > <Alexander.Deucher@xxxxxxx>; gregkh@xxxxxxxxxxxxxxxxxxx > > > > > <gregkh@xxxxxxxxxxxxxxxxxxx>; ppaalanen@xxxxxxxxx > > > > > <ppaalanen@xxxxxxxxx>; Wentland, Harry > > > > > <Harry.Wentland@xxxxxxx> > > > > > Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object > > > > > > > > > > Mhm, I'm not aware of any let over pointer between TTM and GEM and we > > > > > worked quite hard on reducing the size of the amdgpu_bo, so another > > > > > extra pointer just for that corner case would suck quite a bit. > > > > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower > > > > things) which are fairly single-use, so I'm really not much seeing the > > > > point in making this a special case. It also means the lifetime management > > > > becomes a bit iffy, since we can't throw away the dummy page then the last > > > > reference to the bo is released (since we don't track it there), but only > > > > when the last pointer to the device is released. Potentially this means a > > > > pile of dangling pages hanging around for too long. > > > Also if you really, really, really want to have this list, please don't > > > reinvent it since we have it already. drmm_ is exactly meant for resources > > > that should be freed when the final drm_device reference disappears. > > > -Daniel > > > > > > Can you elaborate ? We still need to actually implement the list but you > > want me to use > > drmm_add_action for it's destruction instead of explicitly doing it > > (like I'm already doing from ttm_bo_device_release) ? > > > > Andrey > > > Oh, i get it i think, you want me to allocate each page using drmm_kzalloc > so when drm_dev dies it will be freed on it's own. > Great idea and makes my implementation much less cumbersome. That was my idea, but now after a night's worth of sleep I'm not so sure it's a bright one: We don't just want 4k of memory, we want a page. And I'm not sure kzalloc will give us that (plus using a slab page for mmap might result in fireworks shows). So maybe just drmm_add_action_or_reset (since I'm also not sure we can just use the lists in struct page itself for the page we got when we use alloc_page). -Daniel > > Andrey > > > > > > > > > > If you need some ideas for redundant pointers: > > > > - destroy callback (kinda not cool to not have this const anyway), we > > > > could refcount it all with the overall gem bo. Quite a bit of work. > > > > - bdev pointer, if we move the device ttm stuff into struct drm_device, or > > > > create a common struct ttm_device, we can ditch that > > > > - We could probably merge a few of the fields and find 8 bytes somewhere > > > > - we still have 2 krefs, would probably need to fix that before we can > > > > merge the destroy callbacks > > > > > > > > So there's plenty of room still, if the size of a bo struct is really that > > > > critical. Imo it's not. > > > > > > > > > > > > > Christian. > > > > > > > > > > Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky: > > > > > > Daniel had some objections to this (see bellow) and so I guess I need > > > > > > you both to agree on the approach before I proceed. > > > > > > > > > > > > Andrey > > > > > > > > > > > > On 1/8/21 9:33 AM, Christian König wrote: > > > > > > > Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky: > > > > > > > > Hey Christian, just a ping. > > > > > > > Was there any question for me here? > > > > > > > > > > > > > > As far as I can see the best approach would still be to fill the VMA > > > > > > > with a single dummy page and avoid pointers in the GEM object. > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > > > Andrey > > > > > > > > > > > > > > > > On 1/7/21 11:37 AM, Andrey Grodzovsky wrote: > > > > > > > > > On 1/7/21 11:30 AM, Daniel Vetter wrote: > > > > > > > > > > On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote: > > > > > > > > > > > On 1/7/21 11:21 AM, Daniel Vetter wrote: > > > > > > > > > > > > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote: > > > > > > > > > > > > > On 11/23/20 3:01 AM, Christian König wrote: > > > > > > > > > > > > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky: > > > > > > > > > > > > > > > On 11/21/20 9:15 AM, Christian König wrote: > > > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky: > > > > > > > > > > > > > > > > > Will be used to reroute CPU mapped BO's page faults once > > > > > > > > > > > > > > > > > device is removed. > > > > > > > > > > > > > > > > Uff, one page for each exported DMA-buf? That's not > > > > > > > > > > > > > > > > something we can do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We need to find a different approach here. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can't we call alloc_page() on each fault and link them together > > > > > > > > > > > > > > > > so they are freed when the device is finally reaped? > > > > > > > > > > > > > > > For sure better to optimize and allocate on demand when we reach > > > > > > > > > > > > > > > this corner case, but why the linking ? > > > > > > > > > > > > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ? > > > > > > > > > > > > > > I want to avoid keeping the page in the GEM object. > > > > > > > > > > > > > > > > > > > > > > > > > > > > What we can do is to allocate a page on demand for each fault > > > > > > > > > > > > > > and link > > > > > > > > > > > > > > the together in the bdev instead. > > > > > > > > > > > > > > > > > > > > > > > > > > > > And when the bdev is then finally destroyed after the last > > > > > > > > > > > > > > application > > > > > > > > > > > > > > closed we can finally release all of them. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Christian. > > > > > > > > > > > > > Hey, started to implement this and then realized that by > > > > > > > > > > > > > allocating a page > > > > > > > > > > > > > for each fault indiscriminately > > > > > > > > > > > > > we will be allocating a new page for each faulting virtual > > > > > > > > > > > > > address within a > > > > > > > > > > > > > VA range belonging the same BO > > > > > > > > > > > > > and this is obviously too much and not the intention. Should I > > > > > > > > > > > > > instead use > > > > > > > > > > > > > let's say a hashtable with the hash > > > > > > > > > > > > > key being faulting BO address to actually keep allocating and > > > > > > > > > > > > > reusing same > > > > > > > > > > > > > dummy zero page per GEM BO > > > > > > > > > > > > > (or for that matter DRM file object address for non imported > > > > > > > > > > > > > BOs) ? > > > > > > > > > > > > Why do we need a hashtable? All the sw structures to track this > > > > > > > > > > > > should > > > > > > > > > > > > still be around: > > > > > > > > > > > > - if gem_bo->dma_buf is set the buffer is currently exported as > > > > > > > > > > > > a dma-buf, > > > > > > > > > > > > so defensively allocate a per-bo page > > > > > > > > > > > > - otherwise allocate a per-file page > > > > > > > > > > > That exactly what we have in current implementation > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Or is the idea to save the struct page * pointer? That feels a > > > > > > > > > > > > bit like > > > > > > > > > > > > over-optimizing stuff. Better to have a simple implementation > > > > > > > > > > > > first and > > > > > > > > > > > > then tune it if (and only if) any part of it becomes a problem > > > > > > > > > > > > for normal > > > > > > > > > > > > usage. > > > > > > > > > > > Exactly - the idea is to avoid adding extra pointer to > > > > > > > > > > > drm_gem_object, > > > > > > > > > > > Christian suggested to instead keep a linked list of dummy pages > > > > > > > > > > > to be > > > > > > > > > > > allocated on demand once we hit a vm_fault. I will then also > > > > > > > > > > > prefault the entire > > > > > > > > > > > VA range from vma->vm_end - vma->vm_start to vma->vm_end and map > > > > > > > > > > > them > > > > > > > > > > > to that single dummy page. > > > > > > > > > > This strongly feels like premature optimization. If you're worried > > > > > > > > > > about > > > > > > > > > > the overhead on amdgpu, pay down the debt by removing one of the > > > > > > > > > > redundant > > > > > > > > > > pointers between gem and ttm bo structs (I think we still have > > > > > > > > > > some) :-) > > > > > > > > > > > > > > > > > > > > Until we've nuked these easy&obvious ones we shouldn't play "avoid 1 > > > > > > > > > > pointer just because" games with hashtables. > > > > > > > > > > -Daniel > > > > > > > > > > > > > > > > > > Well, if you and Christian can agree on this approach and suggest > > > > > > > > > maybe what pointer is > > > > > > > > > redundant and can be removed from GEM struct so we can use the > > > > > > > > > 'credit' to add the dummy page > > > > > > > > > to GEM I will be happy to follow through. > > > > > > > > > > > > > > > > > > P.S Hash table is off the table anyway and we are talking only > > > > > > > > > about linked list here since by prefaulting > > > > > > > > > the entire VA range for a vmf->vma i will be avoiding redundant > > > > > > > > > page faults to same VMA VA range and so > > > > > > > > > don't need to search and reuse an existing dummy page but simply > > > > > > > > > create a new one for each next fault. > > > > > > > > > > > > > > > > > > Andrey > > > > -- > > > > Daniel Vetter > > > > Software Engineer, Intel Corporation > > > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C25b079744d6149f8f2d508d8b65825c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459836996005995%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=h5dB%2FP90Gt6t6Oxp%2B9BZzk3YH%2BdYUp3hLQ%2B9bhNMOJM%3D&reserved=0 > > > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@xxxxxxxxxxxxxxxxxxxxx > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&data=04%7C01%7Candrey.grodzovsky%40amd.com%7C25b079744d6149f8f2d508d8b65825c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459836996015986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Y5I5d5g1OIaV5lhmeZpSnM0Y10fTGNW%2Fc2G9O5LPn2g%3D&reserved=0 > > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel