Re: [PATCH v5 2/3] selinux: fix conditional avtab slot hint

Jacob Satterfield <jsatterfield.linux@xxxxxxxxx> · Wed, 29 Nov 2023 17:39:30 +0000

On Wed, Nov 22, 2023 at 6:15 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield
> <jsatterfield.linux@xxxxxxxxx> wrote:
> > On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > >
> > > I'm a concerned about all the work we have to do just to count the
> > > conditional rules.  Other than not working with existing binary
> > > policies, have you looked at bumping the policy version and introducing
> > > a binary format change that would include the number of conditional
> > > rules?
> >
> > Thanks for raising the issue. I had considered adding the total size
> > of the conditional table to the binary policy, but I wasn't sure if it
> > would be substantive enough to warrant bumping the policy version.
>
> I appreciate the thoughtfulness, but the version field in the policy
> is 32-bits and we are currently up to policy format v33 so we've still
> got a few version numbers to play with :)
>

Sounds good, I will implement the policy version bump.

> > As
> > you point out, the counting work will be needed for existing binary
> > policies making this patch necessary for the default case, but if you
> > are concerned about the performance penalty this patch brings (which
> > is less than the gain provided by the avtab array patch), then there
> > are two threads to possibly be worked on.
>
> To be clear, I'm not thinking about supporting this for existing
> policies, just the new policy format; the existing policy versions
> would behave as they do now ... although if we do the array conversion
> we will likely need to do some type of realloc()/retry or something, I
> dunno, I'll leave that to you to brainstorm ;)
>

I still believe the performance gained by converting to arrays
outweighs the cost of supporting older versions. See the discussion
below.

> > One is to rework this patch to include more invasive changes to count
> > rules without actually reading and destroying nodes thus saving cycles
> > but requiring more lines of code. Because policy parsing is not
> > handled separately from the construction of the policydb structure
> > (they are deeply intertwined), I was reluctant to add more complexity
> > just to have a parse-only code path. Would you prefer speed or simpler
> > logic for older policies?
>
> That's the problem we have right now.  We have to do a lot of work
> (allocations, etc.) that we throw away in the case where we are
> counting, not to mention that bolting on the count-only functionality
> is kinda hacky/ugly (not your fault, that is just the way the code is
> right now).  As you mention, the alternative is to significantly
> rework how we parse/load the policy, and that isn't a very exciting
> prospect as far as I'm concerned.
>
> I'm not sure if moving over to flex array is a win, I suspect that
> whatever we gain in memory savings we lose in not having direct
> access.  I dunno, maybe it wouldn't be too bad.
>

Unless I misunderstand, I believe I addressed your concerns from v1
over using indices vs direct access to the arrays. From v2 --> v5, the
array conversion patch only changes how avtab_node elements are
allocated, access is the same as it is now.

> I'm open to ideas here ... I'm looking for something that would
> support the improvements for a new policy with an explicit count,
> while still falling back to something that works "reasonably well" in
> the current case where we have to guess.  In this case "reasonably
> well" means no worse than current in terms of performance and memory
> use while not over complicating the code.
>

For existing policies, I have given this a bit of thought and did some
performance testing. There are memory vs. runtime tradeoffs to
consider between counting the rules (like v5 does) and doing a realloc
(like v1 does). I'm of the opinion that the counting approach is
better overall, but I'll (r)enumerate the tradeoffs here for the
benefit of discussion.

For the realloc approach, a decent chunk of memory (currently 256 KB)
is wasted over-allocating the hash table slots because of the estimate
derived from the regular rules table. Extra memory for avtab_nodes are
allocated (either from overestimation or the dynamic growth factor)
during parsing, sometimes up to 1 MB, requiring a final realloc to
remove them. The growth factor chosen could have additional runtime
implications with regard to multiple reallocs. Regardless, this
approach has runtime costs and memory lost to overestimating the
hashtable slots needed (which in this case, can't be recovered except
through a rehashing of the whole table). 86 lines of code are added,
but it is fairly clean code.

The counting approach incurs a runtime penalty processing the
conditional rules twice. Memory is wasted constructing the struct
cond_node lists and corresponding struct cond_av_list elements twice,
but nowhere close to 1 MB. 23 lines of code are added, but it is as
you said hacky/ugly. The runtime penalty could be further optimized at
the cost of parsing code complexity and additional lines of code as
discussed earlier.

Some performance numbers from Fedora 38 running "perf stat -r 1000 -d
load_policy" (realloc is the v5 array conversion + v1 realloc
patches):

dev:       0.261174 seconds time elapsed ( +- 0.31% ) baseline
v5:         0.233972 seconds time elapsed ( +- 0.31% ) -10%
realloc:  0.226636 seconds time elapsed ( +- 0.34% ) -13.3%

Therefore, to your criteria stated above, both approaches are
empirically no worse than current in terms of performance or memory
used after load. But I believe the current counting approach is the
most reasonable given the lines of code added and the least amount of
memory used during parsing. For what it's worth, I have been
interested in designing a set of patches that would ultimately
refactor policy parsing from policy loading and would, as a side
effect, enable a simpler and more optimized counting approach for
older policies. If this is something you would entertain (even if it's
not exciting), then I will consider it for a future patch.

If you agree, I will introduce the policy version bump in the next
spin and also submit a patch to sepol. Otherwise, I can go back to the
drawing board.

Thanks for reviewing as always!

> --
> paul-moore.com