The speedup that you see is probably mainly caused by better caching. There is a bug in the tile cache size for the plugin. The cache is under most circumstances too small and this means that every requested tile is not in the cache and must be transmitted from the main gimp process (SLOW). Effectively this means that every tile of the drawable is read and written gimp_tile_height (normally 64) times. Your modification is better, I expect that every tile is read and written gimp_tile_height/BLOCKS = 2 times (but when a selection is used it will normally be 3 times). This could be somewhat improved by setting BLOCKS = gimp_tile_height, when the selection starts at a y-position that is a multiple of gimp_tile_height (e.g. when there is no selection) every tile will probably be read/written once, but when the selection starts at a different position, every tile will still be read/written twice. I used a different approach that gives speedups similar to yours but that should read/write every tile only once. The solution is pretty simple: just enlarge the cache size. What is the problem with the cache size? The current code uses: gimp_tile_cache_ntiles (2 * (drawable->width + gimp_tile_width () - 1) / gimp_tile_width ()); The idea here is to cache one row of tiles for the source drawable and one row for the destination drawable. But this is not enough because there is no room in the cache for the bitmap tiles! There is a smaller problem here too, when a selection is used, it is overkill to have a cache for the full width of the drawable. I've attached a patch with my modifications. I hope that someone examines them critically and incorporates them into the distribution. I prefer my approach because it should give better performance and it keeps the code cleaner. Other improvements are still possible. I expect that it should be possible to rewrite the algorithm such that the tile cache contains only 3 tiles. From what I see the algorithm is the same in the horizontal and vertical direction. The current implementation uses 3 extra buffer-rows so when we add 3 extra buffer-columns it should be possible to rewrite the algorithm so that it processes one tile at a time instead of a full row. Thanks for pointing out a pretty big performance problem with the plug-in. Greetings, Ernst <ernstl@xxxxxxxxx>
Attachment:
patchfile
Description: Binary data