A few lines of IRC chat, Freenode #darktable. Hanatos is Darktable project founder. [09:17] <hanatos_> ``requiring SSE3 is not really allowed '' [09:17] <hanatos_> so much bundled cluelessness :/ [09:19] <hanatos_> Germano: re 32-bit [09:19] <hanatos_> the sse thing is one thing [09:20] <hanatos_> the other is the very limited virtual address space (2G really) [09:20] <hanatos_> everybody coding anything half way serious will tell you the same story [09:20] <hanatos_> (rawtherapee has the same issues iirc) [09:20] <hanatos_> our old cache was allocing one big chunk of memory at startup and maintained it manually [09:20] <hanatos_> essentially duplicating a poor man's malloc, specialised for our thumbnail caches [09:21] <hanatos_> the new cache is much faster and easier to read [09:21] <hanatos_> but based on malloc/free [09:21] <hanatos_> which means your virtual address space (not the physical one mapping to your ram) [09:21] <hanatos_> will get fragmented and you quickly start addressing blocks above the 10G range [09:22] <hanatos_> which may not be a problem, even on systems with only 2G of physical ram, because blocks have been freed in between. it's just on 32-bit systems you can't address it any more and die [09:22] <boucman> basically, at this point, DT makes no sense on x86, except maybe dt-cli [09:22] <hanatos_> which is a similar argument as the sse3 is. [09:22] <hanatos_> it's just not a worthwhile experience running this software on this kind of hardware [09:22] <hanatos_> boucman: yes, that. [09:23] <Artefact2> hanatos_: I think jmalloc is also more clever than glibc malloc wrt fragmentation. that's why blender uses it, afaik [09:24] <Artefact2> *je [09:24] <hanatos_> Germano: so i'd like to contradict the `upstream doesn't care' bit [09:25] <hanatos_> upstream does care. [09:25] <hanatos_> just not about random principles and guidelines [09:25] <hanatos_> but about how well darktable runs [09:25] <hanatos_> Artefact2: jemalloc you mean [09:25] <Artefact2> hanatos_: yes [09:25] <hanatos_> yes, it's mostly multithreaded/ [09:25] <hanatos_> block per thread [09:25] <hanatos_> might be worthwhile when running many threads for thumbnail gen [09:25] <hanatos_> but honestly i doubt it [09:26] <hanatos_> it speeds up another piece of code i wrote [09:26] <hanatos_> which uses many 10s of 1000s of malloc calls per second.. [09:26] <hanatos_> we don't do that in dt [09:26] <hanatos_> (or tcmalloc for that matter) [09:26] <hanatos_> simple enough to try with an LD_PRELOAD [09:26] <Artefact2> oh yeah. calling malloc this many times is a bad idea anyway [09:32] <hanatos_> the alternative would have been allocate ridiculous amounts of memory up front [09:32] <hanatos_> bad idea, too [09:32] <hanatos_> but if you have a better solution i'd sure like to hear it :) [09:32] <hanatos_> the problem is to construct a binary search tree [09:32] <Artefact2> i'm not a memory guru, sadly :| [09:32] <hanatos_> in parallel [09:32] <hanatos_> so you start at the root and push the children as new jobs (malloc job_t) [09:32] <hanatos_> and so on [09:33] <hanatos_> it's millions of nodes total, so you don't want to allocate them up front [09:33] <Artefact2> maybe a compromise. allocate a pool that can store, say 10 jobs at a time [09:34] <hanatos_> (and yes, i would agree.. calling malloc is almost always a bad idea, unless you can't avoid it) [09:34] <hanatos_> but see.. that pool per thread.. that's exactly what jemalloc/tcmalloc do [09:34] <Artefact2> this way you reduce the allocator load by a factor of 10, while still not allocating huge amounts of contiguous memory [09:35] <Artefact2> maybe the issue is elsewhere. what are you doing millions of? is it possible to make "bigger" jobs and have less of them? ie a smaller tree [09:38] <hanatos_> nope, can't touch the tree [09:38] <hanatos_> its some spatial acceleration structure for ray tracing [09:39] <hanatos_> it's been optimised for fast ray tracing for many years [09:39] <Artefact2> are we still talking about darktable? didn't know it needed a raytracer [09:40] <hanatos_> no, different piece of code.. as i said, i don't think darktable needs thread-cached malloc [10:11] <hanatos_> Germano: also feel free to refer those guys here to us if they have questions. seems to me that some direct contact may be better. -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct