Re: [PATCH v5] selinux: add prefix/suffix matching to filename type transitions

Paul Moore <paul@xxxxxxxxxxxxxx> · Tue, 2 Apr 2024 16:21:30 -0400

On Tue, Apr 2, 2024 at 11:35 AM Juraj Marcin <juraj@xxxxxxxxxxxxxxx> wrote:
>
> Hi Paul
>
> Thank you for your feedback!

Thank you for the patch :)

> On 2024-03-27 21:22, Paul Moore wrote:
> > On Mar  5, 2024 Juraj Marcin <juraj@xxxxxxxxxxxxxxx> wrote:

...

> > >    | Mem   | Binary | Policy  | Create    | osbench [2]
> > >    | Usage | policy | load    | tty [1]   | create
> > >    |       | size   | time    |           | files
> > >    | (MiB) | (KiB)  | (ms)    | (ms/file) | (us/file)
> > > ---+-------+--------+---------+-----------+-----------
> > >  1 |   430 |   3682 |  46.930 |    2.4569 |   43.4431
> > >    | sd= 8 |        | sd=0.25 |  sd=0.043 |  sd=0.434
> > >  2 |   410 |   3682 |  46.829 |    2.4623 |   43.4112
> > >    | sd=20 |        | sd=0.27 |  sd=0.053 |  sd=0.442
> > >  3 |   419 |   3682 |  46.823 |    2.4257 |   43.0575
> > >    | sd=14 |        | sd=0.18 |  sd=0.057 |  sd=0.416
> > >  4 |   420 |   2585 |  42.044 |    2.5028 |   43.7907
> > >    | sd=10 |        | sd=0.19 |  sd=0.067 |  sd=0.382
> >
> > Thanks for the updated patchset :)
> >
> > I like the improvements in #2 and #3, with only one test being slightly
> > worse than #1, I think we're okay there.  However, it looks like there
> > has been a regression in #4 in terms of runtime performance ... hmmm.
>
> It might look like a regression but there is some variance between runs,
> even though I ran single benchmark many times and cases right after the
> previous one. There is a similar difference but in the other direction
> between #2 and #3 even though they both run with the same kernel and
> basically the same policy (the only difference being explicit zero
> counts of prefix and suffix rules in the policy file for #3).
>
> I ran the whole benchmark suite again 2 times with the same reference
> and patched kernels and got the following results:
>
>    | Mem   | Binary | Policy  | Create    | osbench [2]
>    | Usage | policy | load    | tty [1]   | create
>    |       | size   | time    |           | files
>    | (MiB) | (KiB)  | (ms)    | (ms/file) | (us/file)
> ---+-------+--------+---------+-----------+-----------
>  1 |   436 |   3682 |  48.955 |    2.5037 |   43.7484
>    | sd=20 |        | sd=0.25 |  sd=0.048 |  sd=0.445
>  2 |   434 |   3682 |  48.331 |    2.4314 |   43.6714
>    | sd=16 |        | sd=0.25 |  sd=0.044 |  sd=0.422
>  3 |   421 |   3682 |  47.592 |    2.4633 |   43.5746
>    | sd=19 |        | sd=0.28 |  sd=0.056 |  sd=0.375
>  4 |   413 |   2585 |  41.895 |    2.4608 |   43.7485
>    | sd=13 |        | sd=0.19 |  sd=0.063 |  sd=0.378
>
>    | Mem   | Binary | Policy  | Create    | osbench [2]
>    | Usage | policy | load    | tty [1]   | create
>    |       | size   | time    |           | files
>    | (MiB) | (KiB)  | (ms)    | (ms/file) | (us/file)
> ---+-------+--------+---------+-----------+-----------
>  1 |   442 |   3682 |  48.639 |    2.4741 |   43.4764
>    | sd=23 |        | sd=0.21 |  sd=0.059 |  sd=0.487
>  2 |   422 |   3682 |  48.305 |    2.4191 |   43.6313
>    | sd=10 |        | sd=0.21 |  sd=0.064 |  sd=0.460
>  3 |   427 |   3682 |  47.524 |    2.4798 |   43.7239
>    | sd=21 |        | sd=0.17 |  sd=0.055 |  sd=0.590
>  4 |   416 |   2585 |  41.764 |    2.4272 |   43.6712
>    | sd= 8 |        | sd=0.20 |  sd=0.052 |  sd=0.376
>
> As can be seen, sometimes one case is the fastest one for the given test
> by a few tenths of a millisecond/microsecond, sometimes it is the other
> way around. Regression in one test case would result in it being slower
> consistently between runs.

In all three reported results, test #4 is slower than test #1 in every
case, although it is within 0.001 in one result.  While I agree there
does appear to be variation in the tests, the fact that none of the
results show #4 as being faster than #1 does seem to indicate a
performance regression in test #4.

> > > diff --git a/security/selinux/ss/policydb.c b/security/selinux/ss/policydb.c
> > > index 3d22d5baa829b..5de22ba840673 100644
> > > --- a/security/selinux/ss/policydb.c
> > > +++ b/security/selinux/ss/policydb.c
> > > @@ -429,7 +434,11 @@ static int filenametr_cmp(const void *k1, const void *k2)
> > >     if (v)
> > >             return v;
> > >
> > > -   return strcmp(ft1->name, ft2->name);
> > > +   v = strncmp(ft1->name, ft2->name, min(ft1->name_len, ft2->name_len));
> > > +   if (ft1->name_len == ft2->name_len || v)
> > > +           return v;
> > > +   /* if one name is prefix of the other, the longer is greater */
> > > +   return (int)ft1->name_len - (int)ft2->name_len;
> > >  }
> >
> > I'm probably missing some subtlety in the code above, but I think we
> > might be able to optimize this a bit more:
> >
> >   if (ft1->name_len != ft2->name_len)
> >     return (int)ft1->name_len - (int)ft2->name_len;
> >   return strcmp(ft1->name, ft2->name);
> >
> > I expect being able to avoid a string comparison in some cases should
> > result in a small performance bump.
>
> That optimization would lead to a different ordering than before with
> the plain `strcmp()` - it would sort the strings by length first and
> only then alphabetically. In fact, the `name_len` comparison could be
> removed from the condition entirely without changing the ordering. This
> could make it more clear.
>
> However, this change in ordering might not matter as it is only a key
> comparator function for the hash table and items from the policy are
> inserted one by one. I will also ask Ondrej about this.

With hashtab_search() exiting early on a negative result from
filenametr_cmp() it may look like the ordering within the bucket
matters, but so long as we use the same sorting order when
accessing/inserting entries it shouldn't matter.  The important thing
is that we can search through the bucket entries as fast as possible,
and performing integer comparisons/math is going to be a lot faster
than performing a string comparison.

> > > diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h
> > > index 4bba386264a3d..78abb959e7205 100644
> > > --- a/security/selinux/ss/policydb.h
> > > +++ b/security/selinux/ss/policydb.h
> > > @@ -232,6 +233,11 @@ struct genfs {
> > >  #define OCON_IBENDPORT 8 /* Infiniband end ports */
> > >  #define OCON_NUM       9
> > >
> > > +#define FILENAME_TRANS_MATCH_EXACT 0
> > > +#define FILENAME_TRANS_MATCH_PREFIX 1
> > > +#define FILENAME_TRANS_MATCH_SUFFIX 2
> > > +#define FILENAME_TRANS_MATCH_NUM 3
> >
> > I think we should probably tweak things to help indicate that the last
> > macro/define above is not a usable value, how about this renaming it to
> > FILENAME_TRANS_MATCH_MAX?  I'm open to other suggestions too ...
>
> I thought being consistent with other array indices (`SYM_*`, `OCON_*`)
> would be better, but I am also open to change the suffix to `_MAX` if
> there are no objections.

Ugh, yes, you're right.  We should change those other #defines, but
let's leave that for another time.  Stick with _NUM.

> Then possibly update also `SYM_NUM` and `OCON_NUM` defines in another
> patch.

Yes, that's the right approach as far as I'm concerned, but that's
something for later.

-- 
paul-moore.com