Re: [PATCH userspace v2] libsepol: cache ebitmap cardinality value

William Roberts <bill.c.roberts@xxxxxxxxx> · Tue, 25 Feb 2020 15:56:54 -0600

On Tue, Feb 25, 2020 at 3:33 PM Nicolas Iooss <nicolas.iooss@xxxxxxx> wrote:
>
> On Tue, Feb 18, 2020 at 5:01 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> >
> > On Tue, Feb 18, 2020 at 4:40 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> > > On 2/18/20 10:22 AM, Ondrej Mosnacek wrote:
> > > > On Thu, Feb 13, 2020 at 2:40 PM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > > >> According to profiling of semodule -BN, ebitmap_cardinality() is called
> > > >> quite often and contributes a lot to the total runtime. Cache its result
> > > >> in the ebitmap struct to reduce this overhead. The cached value is
> > > >> invalidated on most modifying operations, but ebitmap_cardinality() is
> > > >> usually called once the ebitmap doesn't change any more.
> > > >>
> > > >> After this patch, the time to do 'semodule -BN' on Fedora Rawhide has
> > > >> decreased from ~14.6s to ~12.4s (2.2s saved).
> > > >
> > > > I have no idea why, but I'm now getting completely different times
> > > > (10.9s vs. 8.9s) with the same builds on the same setup... I can no
> > > > longer reproduce the slower times anywhere (F31/locally/...) so I have
> > > > to assume it was some kind of glitch. Since the numbers show a similar
> > > > magnitude of speed-up (and they depend on a bunch of HW/SW factors
> > > > anyway), I'm not going to do another respin. The applying person (most
> > > > likely Stephen) is free to fix the numbers when applying if they wish
> > > > to do so.
> > >
> > > Thanks, applied with fixed times (although I don't really think it
> > > matters very much).  Maybe you're also picking up the difference from
> > > the "libsepol/cil: remove unnecessary hash tables" change.
> >
> > No, that was actually the reason for the first correction.
>
> Hello,
> About performance issues, the current implementation of
> ebitmap_cardinality() is quadratic:
>
> for (i=ebitmap_startbit(e1); i < ebitmap_length(e1); i++)
>     if (ebitmap_get_bit(e1, i))
>         count++;
>
> ... because ebitmap_get_bit() browse the bitmap:
>
> while (n && (n->startbit <= bit)) {
>    if ((n->startbit + MAPSIZE) > bit) {
>       /*... */
>
> A few years ago, I tried modifying this function to make it linear in
> the bitmap size:
>
> unsigned int ebitmap_cardinality(ebitmap_t *e1)
> {
>     unsigned int count = 0;
>     ebitmap_node_t *n;
>
>    for (n = e1->node; n; n = n->next) {
>         count += __builtin_popcountll(n->map);
>     }
>     return count;
> }
>
> ... but never actually sent a patch for this, because I wanted to
> assess how __builtin_popcountll() was supported by several compilers
> beforehand. Would this be helpful to gain even more performance gain?

Every architecture I've used has an instruction it boils down to:
x86 - POPCNT
ARM (neon): vcnt

For others, (do they even matter at this point) I would imagine GCC
does something relatively sane.

>
> Cheers,
> Nicolas
>