On Wed, Aug 1, 2018 at 2:29 PM, Christian König <christian.koenig at amd.com> wrote: > Am 01.08.2018 um 19:59 schrieb Marek Olšák: >> >> On Wed, Aug 1, 2018 at 1:52 PM, Christian König >> <christian.koenig at amd.com> wrote: >>> >>> Am 01.08.2018 um 19:39 schrieb Marek Olšák: >>>> >>>> On Wed, Aug 1, 2018 at 2:32 AM, Christian König >>>> <christian.koenig at amd.com> wrote: >>>>> >>>>> Am 01.08.2018 um 00:07 schrieb Marek Olšák: >>>>>> >>>>>> Can this be implemented as a wrapper on top of libdrm? So that the >>>>>> tree (or hash table) isn't created for UMDs that don't need it. >>>>> >>>>> >>>>> No, the problem is that an application gets a CPU pointer from one API >>>>> and >>>>> tries to import that pointer into another one. >>>>> >>>>> In other words we need to implement this independent of the UMD who >>>>> mapped >>>>> the BO. >>>> >>>> Yeah, it could be an optional feature of libdrm, and other components >>>> should be able to disable it to remove the overhead. >>> >>> >>> The overhead is negligible, the real problem is the memory footprint. >>> >>> A brief look at the hash implementation in libdrm showed that this is >>> actually really inefficient. >>> >>> I think we have the choice of implementing a r/b tree to map the CPU >>> pointer >>> addresses or implement a quadratic tree to map the handles. >>> >>> The later is easy to do and would also allow to get rid of the hash table >>> as >>> well. >> >> We can also use the hash table from mesa/src/util. >> >> I don't think the overhead would be negligible. It would be a log(n) >> insertion in bo_map and a log(n) deletion in bo_unmap. If you did >> bo_map+bo_unmap 10000 times, would it be negligible? > > > Compared to what the kernel needs to do for updating the page tables it is > less than 1% of the total work. > > The real question is if it wouldn't be simpler to use a tree for the > handles. Since the handles are dense you can just use an unbalanced tree > which is really easy. > > For a tree of the CPU mappings we would need an r/b interval tree, which is > hard to implement and quite some overkill. > > Do you have any numbers how many BOs really get a CPU mapping in a real > world application? Without our suballocator, we sometimes exceeded the max. mmap limit (~64K). It should be much less with the suballocator with 128KB slabs, probably a few thousands. Marek