On Wed, Nov 22, 2023 at 6:15 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > On Wed, Nov 22, 2023 at 12:35 PM Jacob Satterfield > <jsatterfield.linux@xxxxxxxxx> wrote: > > On Mon, Nov 20, 2023 at 8:29 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > > > > > > I'm a concerned about all the work we have to do just to count the > > > conditional rules. Other than not working with existing binary > > > policies, have you looked at bumping the policy version and introducing > > > a binary format change that would include the number of conditional > > > rules? > > > > Thanks for raising the issue. I had considered adding the total size > > of the conditional table to the binary policy, but I wasn't sure if it > > would be substantive enough to warrant bumping the policy version. > > I appreciate the thoughtfulness, but the version field in the policy > is 32-bits and we are currently up to policy format v33 so we've still > got a few version numbers to play with :) > Sounds good, I will implement the policy version bump. > > As > > you point out, the counting work will be needed for existing binary > > policies making this patch necessary for the default case, but if you > > are concerned about the performance penalty this patch brings (which > > is less than the gain provided by the avtab array patch), then there > > are two threads to possibly be worked on. > > To be clear, I'm not thinking about supporting this for existing > policies, just the new policy format; the existing policy versions > would behave as they do now ... although if we do the array conversion > we will likely need to do some type of realloc()/retry or something, I > dunno, I'll leave that to you to brainstorm ;) > I still believe the performance gained by converting to arrays outweighs the cost of supporting older versions. See the discussion below. > > One is to rework this patch to include more invasive changes to count > > rules without actually reading and destroying nodes thus saving cycles > > but requiring more lines of code. Because policy parsing is not > > handled separately from the construction of the policydb structure > > (they are deeply intertwined), I was reluctant to add more complexity > > just to have a parse-only code path. Would you prefer speed or simpler > > logic for older policies? > > That's the problem we have right now. We have to do a lot of work > (allocations, etc.) that we throw away in the case where we are > counting, not to mention that bolting on the count-only functionality > is kinda hacky/ugly (not your fault, that is just the way the code is > right now). As you mention, the alternative is to significantly > rework how we parse/load the policy, and that isn't a very exciting > prospect as far as I'm concerned. > > I'm not sure if moving over to flex array is a win, I suspect that > whatever we gain in memory savings we lose in not having direct > access. I dunno, maybe it wouldn't be too bad. > Unless I misunderstand, I believe I addressed your concerns from v1 over using indices vs direct access to the arrays. From v2 --> v5, the array conversion patch only changes how avtab_node elements are allocated, access is the same as it is now. > I'm open to ideas here ... I'm looking for something that would > support the improvements for a new policy with an explicit count, > while still falling back to something that works "reasonably well" in > the current case where we have to guess. In this case "reasonably > well" means no worse than current in terms of performance and memory > use while not over complicating the code. > For existing policies, I have given this a bit of thought and did some performance testing. There are memory vs. runtime tradeoffs to consider between counting the rules (like v5 does) and doing a realloc (like v1 does). I'm of the opinion that the counting approach is better overall, but I'll (r)enumerate the tradeoffs here for the benefit of discussion. For the realloc approach, a decent chunk of memory (currently 256 KB) is wasted over-allocating the hash table slots because of the estimate derived from the regular rules table. Extra memory for avtab_nodes are allocated (either from overestimation or the dynamic growth factor) during parsing, sometimes up to 1 MB, requiring a final realloc to remove them. The growth factor chosen could have additional runtime implications with regard to multiple reallocs. Regardless, this approach has runtime costs and memory lost to overestimating the hashtable slots needed (which in this case, can't be recovered except through a rehashing of the whole table). 86 lines of code are added, but it is fairly clean code. The counting approach incurs a runtime penalty processing the conditional rules twice. Memory is wasted constructing the struct cond_node lists and corresponding struct cond_av_list elements twice, but nowhere close to 1 MB. 23 lines of code are added, but it is as you said hacky/ugly. The runtime penalty could be further optimized at the cost of parsing code complexity and additional lines of code as discussed earlier. Some performance numbers from Fedora 38 running "perf stat -r 1000 -d load_policy" (realloc is the v5 array conversion + v1 realloc patches): dev: 0.261174 seconds time elapsed ( +- 0.31% ) baseline v5: 0.233972 seconds time elapsed ( +- 0.31% ) -10% realloc: 0.226636 seconds time elapsed ( +- 0.34% ) -13.3% Therefore, to your criteria stated above, both approaches are empirically no worse than current in terms of performance or memory used after load. But I believe the current counting approach is the most reasonable given the lines of code added and the least amount of memory used during parsing. For what it's worth, I have been interested in designing a set of patches that would ultimately refactor policy parsing from policy loading and would, as a side effect, enable a simpler and more optimized counting approach for older policies. If this is something you would entertain (even if it's not exciting), then I will consider it for a future patch. If you agree, I will introduce the policy version bump in the next spin and also submit a patch to sepol. Otherwise, I can go back to the drawing board. Thanks for reviewing as always! > -- > paul-moore.com