Thank you for taking a serious look at GEGL, I've trimmed away the bits relating to the VIPS backend and rather focus on the performance numbers you get out and will try to explain them. On Sun, Apr 17, 2011 at 10:22 AM, <jcupitt@xxxxxxxxx> wrote: > Linked against gegl-vips with the operations set to exactly match > gegl's processing, the same thing runs in 27s real, 38s user. So it > looks like some tuning of the disc cache, or maybe even turning it off > for batch processing, where you seldom need pixels more than once, > could give gegl a very useful speedup here. libvips has a threading > system which is on by default and does double-buffered write-behind, > which also help. On my c2d 1.86ghz laptop I get 105s real 41s user with default settings. Setting GEGL_SWAP=RAM in the environment to turn off the disk swapping of tiles makes it run in 43s real 41s user. With the default settings GEGL will start swapping when using more than 128mb of memory for buffers, this limit can be increased by setting for instance GEGL_CACHE_SIZE=1024 to not start swapping until 1gb of memory is in use. This leads to similar behavior, the tile backend of GEGL is using reads and writes on the tiles, using mmaping instead could increase the performance. > If you use uncompressed tiff, you can save a further 15s off the > runtime. libpng compression is slow, and even with compression off, > file write is sluggish. Loading a png into a tiled buffer as used by GeglBuffer is kind of bound to be slow, at the moment GEGL doesnt have a native TIFF loader, if the resources were spent on writing a proper TIFF backend to GeglBuffer GEGL would be able to lazily swap in the image data from TIFF files as needed. > babl converts to linear float and back with exp() and log(). Using > lookup tables instead saves 12s. If the original PNG was 8bit, babl should have a valid fast path for using lookup tables converting it to 32bit linear. For most other conversions involved in this process babl would likely fall back to reference conversions that go via 64bit floating point; and processes each pixel with lots of logic perutating components etc. By adding/fixing the fast paths in babl to match the reference conversion a lot of the time spent converting pixels in this test should vanish. > The gegl unsharp operator is implemented as gblur/sub/mul/add. These > are all linear operations, so you can fold the maths into a single > convolution. Redoing unsharp as a separable convolution saves 1s. For smaller radiuses this is fine, for larger ones it is not, ideally GEGL would be doing what is optimal behind the users back. > Finally, we don't really need 16-bit output here, 8 is fine. This > saves only 0.5s for tiff, but 8s for PNG. Making the test case you used save to 8bit PNG instead gives me 34s real and 33s user. I am not entirely sure if babl has a 32bit float -> 8bit nonlinear RGBA conversion, it might just be libpngs data throughput that makes this difference. save = gegl_node_new_child (gegl, "operation", "gegl:png-save", "bitdepth", 8, "path", argv[2], NULL); > Putting all these together, you get the same program running in 2.3s > real, 4s user. This is still using linear float light internally. If > you switch to a full 8-bit path you get 1s real, 1.5s user. I realise > gegl is committed to float, but it's interesting to put a number on > the cost. This type of benchmark really stress tests the file loading/saving parts of code where I am fully aware that GEGL is far from optimal, but it is also something that doesn't in any way reflect GIMPs _current_ use of GEGL which involves converting 8bit data to and from float with some very specific formats and then only doing raw processing. This will of course change in the future. > Does this sound useful? I think it's maybe a way to weight the > benefits of the various possible optimisations. I might try running > the tests on a machine with a faster hard disk. It is useful, but it would perhaps be even more useful to see similar results for a test where the loading/saving is taken out of the benchmark and measure raw image data crunching. Setting GEGL_SWAP=RAM, BABL_TOLERANCE=0.02 in the environment to make babl be lenient with the error introduced by its fast paths I run the test in, it should be possible to fix the fast paths in babl to be correct enough to pass the current stricter criteria for use; and thus get these results without lowering standards. Even adding slightly faster but guaranteed to be correct 8bit/16bit <-> float conversions would likely improve this type of benchmarking. 16bit output: real: 28.3s user: 26.9s 8bit output: real: 25.1s user: 23.6s Thank you for looking at this - and I do hope my comments above help explain some of the reasons for the slower processing. /Øyvind K. -- «The future is already here. It's just not very evenly distributed» -- William Gibson http://pippin.gimp.org/ ; http://ffii.org/ _______________________________________________ Gimp-developer mailing list Gimp-developer@xxxxxxxxxxxxxxxxxxxxxx https://lists.XCF.Berkeley.EDU/mailman/listinfo/gimp-developer