Re: Cache strategy

Øyvind Kolås <islewind@xxxxxxxxx> · Tue, 16 Jun 2009 11:32:04 +0100

On Tue, Jun 16, 2009 at 11:11 AM, <jcupitt@xxxxxxxxx> wrote:
>>> Another thing worth mentioning is that caches on every node doesn't
>>> scale well to concurrent evaluation of the graph since the evaluators
>>> would need to all the time synchronize usage of the caches, preventing
>>> nice scaling of performance as you use more CPU cores/CPUs.
>>
>> In most instances, this would only incur synchronization of a few
>> tiles where the chunks/work regions overlaps. Unless you are stupid
>> and compute with chunk-size~=tile-size the impact of this should be
>> mostly neglible.
>
> You would still need a lock on the cache wouldn't you? For example, if
> the cache is held as a GHashTable of tiles, even if individual tiles
> are disjoint and not shared, you'll still need to lock the hash table
> before you can search it. A couple of locks on every tile on every
> node will hurt SMP scaling badly.

You do not need to make this be a global hashtable the way it is done
with GeglBuffers it would end up being one hashtable per node that has
a cache, not one cache for for all nodes. If I understood your concern
correctly it was about having a global lock that is locked. Also note
that GEGL does not use the tile-size for the computations, tile
dimensions are a concept that only belongs on the GeglBuffer level
that should be ignored when using the GeglBuffer API and at higher
levels of abstraction.

>
> For what it's worth, vips handles this by keeping caches
> thread-private. If threads are working in disjoint areas of the image,
> there's no benefit to a shared cache anyway. As you say, there will be
> a degree of recomputation for some operations (eg. convolution), but
> that's a small cost compared to lock/unlock.
>
> vips has two types of cache: a very small (just 1 or 2 tiles)
> thread-private cache on every image, and a large and complex shared
> cache operation that can be explicitly added to the graph. The GUI
> adds a cache operator automatically just after every operation that
> can output to the display, but you can use it elsewhere if you wish.

This type of cache is the cache that is currently in use in GEGL,
although it isnt an explicit member of the graph, but contained within
the preceding node (it is the sparse tiled buffer written to by the
operation contained in the node instead of allocating/recycling
temporary one-off buffers). The caches are GeglBuffers. Each
GeglBuffer (and thus cache) has a tile-cache as well that keeps
frequently used data from being swapped out to disk.

-- 
«The future is already here. It's just not very evenly distributed»
                                                 -- William Gibson
http://pippin.gimp.org/                            http://ffii.org/
_______________________________________________
Gegl-developer mailing list
Gegl-developer@xxxxxxxxxxxxxxxxxxxxxx
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer