On Tue, Feb 25, 2020 at 10:57 PM William Roberts <bill.c.roberts@xxxxxxxxx> wrote: > On Tue, Feb 25, 2020 at 3:33 PM Nicolas Iooss <nicolas.iooss@xxxxxxx> wrote: > > > > On Tue, Feb 18, 2020 at 5:01 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote: > > > > > > On Tue, Feb 18, 2020 at 4:40 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote: > > > > On 2/18/20 10:22 AM, Ondrej Mosnacek wrote: > > > > > On Thu, Feb 13, 2020 at 2:40 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote: > > > > >> According to profiling of semodule -BN, ebitmap_cardinality() is called > > > > >> quite often and contributes a lot to the total runtime. Cache its result > > > > >> in the ebitmap struct to reduce this overhead. The cached value is > > > > >> invalidated on most modifying operations, but ebitmap_cardinality() is > > > > >> usually called once the ebitmap doesn't change any more. > > > > >> > > > > >> After this patch, the time to do 'semodule -BN' on Fedora Rawhide has > > > > >> decreased from ~14.6s to ~12.4s (2.2s saved). > > > > > > > > > > I have no idea why, but I'm now getting completely different times > > > > > (10.9s vs. 8.9s) with the same builds on the same setup... I can no > > > > > longer reproduce the slower times anywhere (F31/locally/...) so I have > > > > > to assume it was some kind of glitch. Since the numbers show a similar > > > > > magnitude of speed-up (and they depend on a bunch of HW/SW factors > > > > > anyway), I'm not going to do another respin. The applying person (most > > > > > likely Stephen) is free to fix the numbers when applying if they wish > > > > > to do so. > > > > > > > > Thanks, applied with fixed times (although I don't really think it > > > > matters very much). Maybe you're also picking up the difference from > > > > the "libsepol/cil: remove unnecessary hash tables" change. > > > > > > No, that was actually the reason for the first correction. > > > > Hello, > > About performance issues, the current implementation of > > ebitmap_cardinality() is quadratic: > > > > for (i=ebitmap_startbit(e1); i < ebitmap_length(e1); i++) > > if (ebitmap_get_bit(e1, i)) > > count++; > > > > ... because ebitmap_get_bit() browse the bitmap: > > > > while (n && (n->startbit <= bit)) { > > if ((n->startbit + MAPSIZE) > bit) { > > /*... */ Hm... I didn't realize that the function is actually quadratic. > > > > A few years ago, I tried modifying this function to make it linear in > > the bitmap size: > > > > unsigned int ebitmap_cardinality(ebitmap_t *e1) > > { > > unsigned int count = 0; > > ebitmap_node_t *n; > > > > for (n = e1->node; n; n = n->next) { > > count += __builtin_popcountll(n->map); > > } > > return count; > > } > > > > ... but never actually sent a patch for this, because I wanted to > > assess how __builtin_popcountll() was supported by several compilers > > beforehand. Would this be helpful to gain even more performance gain? > > Every architecture I've used has an instruction it boils down to: > x86 - POPCNT > ARM (neon): vcnt Note that the compiler will only emit these instructions if you compile with the right target platform (-mpopcnt or something that includes it on x86_64). Portable generic builds will usually not use it. Still, even without the special instruction __builtin_popcountll() should generate more optimal code than the naive add-each-bit-one-by-one approach. For example, I came up with this pure C implementation of 64-bit popcount [1] that both GCC and Clang can compile down to ~36 instructions. The generic version of __builtin_popcountll() likely does something similar. (Actually, here is what Clang seems to use [2], which is pretty close.) FWIW, I tested the __builtin_popcountll() version with the caching patch reverted (built without popcnt support) and it actually performed even better than the old code + caching (it went down to ~0.11% of semodule -B running time). A naive popcount implementation without caching didn't perform as good (was slower than the old code + caching). So... we could just open-code some good generic C implementation (cleanly written and properly commented, of course) and then we wouldn't have to rely on the compiler builtin. OTOH, the SELinux userspace already uses non-standard compiler extensions (__attribute__(...)), so maybe sticking to pure C is not worth it... Either way I think we should revert the caching patch along with switching to an optimized implementation (it would no longer be worth the added complexity IMO). [1] https://gcc.godbolt.org/z/39W7qa [2] https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/popcountdi2.c > > For others, (do they even matter at this point) I would imagine GCC > does something relatively sane. > > > > > Cheers, > > Nicolas > > > -- Ondrej Mosnacek <omosnace at redhat dot com> Software Engineer, Security Technologies Red Hat, Inc.