Re: [PATCH 0/9] make struct drm_mm_node embeddable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/15/2010 08:45 PM, Daniel Vetter wrote:
Hi Thomas,

On Mon, Nov 15, 2010 at 08:58:13AM +0100, Thomas Hellstrom wrote:
Nice work, although I have some comments about general applicability
that we perhaps need to think about.

1) The space representation and space allocation algorithm is
something that is private to the aperture management system. For a
specialized implementation like i915 that is all fine, but Ben has
recently abstracted that part out of the core TTM bo implementation.
As an example, vmwgfx is now using kernel idas to manage aperture
space, and drm_mm objects for traditional VRAM space. Hence,
embedding drm_mm objects into ttm bos will not really be worthwile.
At least not for aperture space management, and TTM will need to
continue to "dance", both in the ida case and in the drm_mm case.
Yep, I've looked into this and noticed the recent addition of the ida
support. This is why I've added the "decent surgery" comment. Embedding
the drm_mm_node still looks possible, albeit perhaps not feasible (at
least I won't tackle this in the immediate future).

Indeed, it's possible but for drivers that don't use it, it will sit unused.

2) The algorithm used by drm_mm has been around for a while and has
seen a fair amount of changes, but nobody has yet attacked the
algorithm used to search for free space, which was just quickly put
together as an improvement on what was the old mesa range manager.
In moderate fragmentation situations, the performance will degrade,
particularly with "best match" searches. In the near future we'd
probably want to add something like a "hole rb tree" rather than a
"hole list", and a choice of algorithm for the user. With embeddable
objects, unless you want to waste space for unused members, you'd
need a separate drm_mm node subclass for each algorithm, whereas if
you don't embed, you only need to allocate what you need.
First a small rant about "best match" (to get it out of the way;-)
- "best match" is simply a misleading name: with alignment>  size
   (at least on older hw) and mixes of unrestricted and range restricted
   allocations (ironlake has 2G of gtt, just 256 of it mappable), which is
   all possible with the latest experimental i915 patches, "best match" can
   do worse than the simpler approach.
- doing a full linear scan for every tiny state buffer/pixmap cache is
   slow.
At this, it serves as an excuse to not implement proper eviction support.
</rant>
[If you agree, I'll happily write the patch to rip it out. It just doesn't
bother me 'cause it's only a few lines in drm_mm.c and I can ignore the
actual users.]

Now to the more useful discussion: IMHO drm_mm.c should be an allocator
for vram/(g)tt, i.e. it needs to support:
- a mix of large/small sizes.
- fancy alignment constrains (new patches for drm/i915 are pushing things
   there).
- range-restricted allocations. I think current users only ever have one
   (start, end) set for restricted allocations, so this might actually be
   simplified.
If other users don't fit into this anymore, mea culpa, they need they're
own allocator. You've already taken this path for vmwgfx by using the ida
allocator. And if the linear scan for the gem mmap offset allocator ever
shows up in profiles, I think it's better served with a buddy-style
pot-sized, pot-aligned allocator. After all, fragmentation of virtual
address space isn't a that severe problem.

I must admit I haven't done detailed benchmarking, particularly not for
cases with a _huge_ number of bo's but I'm prepared to accept the fact that
"first match" gives good enough results. For the mmap offset fragmentation it's less of a problem since it should be straightforward to move bos in the address space with
an additional translation of the user-space offset (if ever needed).

Hence I think that drivers with extremely specific needs should roll their
own allocator. So I don't think we should anticipate different allocator
algorithms. I see driver-specific stuff more in the area of clever
eviction algorithms - i915 is currently at 5 lru's for gtt mapped bos, and
we're still adding.



Yes, I agree. My point was merely that one should think twice before embedding drm_mm objects in generic buffer objects intended also for drivers with special needs.

To make a long story short, I've opted to make the current code faster by
avoiding kmalloc and spoiling fewer cache-lines with useless data. And if
the linear scan ever shows up in profiles, we could always add some stats
to bail out early for large allocations. Or add a tree to heuristically
find a suitable hole (assuming worst-case waste due to alignment).


Thanks a lot for your input on this.

Yours, Daniel

Thanks,
Thomas

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux