Re: A filename to label translation daemon

Eric Paris <eparis@xxxxxxxxxxxxxx> · Tue, 14 Aug 2012 08:38:50 -0400

On Tue, Aug 14, 2012 at 7:18 AM, Russell Coker <russell@xxxxxxxxxxxx> wrote:
> On Tue, 14 Aug 2012, Colin Walters <walters@xxxxxxxxxx> wrote:
>> Really though in the big picture, while the file context regexps were
>> probably an OK solution way back when SELinux was a "proof of concept"
>> prototype, the current policy generating 5000 of them is just crazy...
>
> Actually the situation is way better than it was in the early days.
>
> When I first started working on SE Linux the software wasn't as optimised and
> the hardware was way slower.  A restorecon type operation would be 99% user
> CPU time, taking more than 20 minutes of CPU time for relabelling a relatively
> small filesystem was common.
>
> Having 5000 on a modern for argument sake (it's 1923 on my system, but that
> depends on whether you load a policy with everything or just the modules you
> need) is a lot easier than the situation in the early days with fewer regular
> expressions.
>
>> One other possibility - I bet one could get a huge speedup in some cases
>> by splitting up the regexp set based on common prefixes.  For example,
>> if you're trying to match /tmp/krb5cc, there's no reason to run over all
>> 2000 regexps which start with /usr.  This solution is kind of an
>> intermediate step between "run 5000 regexps serially" and "write custom
>> code to compile 5000 regexps into a DFA that returns a context".
>
> Yes, I wrote code to do that many years ago.  Any regex which had a fixed
> string for the first subdirectory from root would only be called for a filename
> which was in the same subdirectory.  The prefixes were indexed so an integer
> compare would be used to determine whether a regex would be called.  Regexes
> which applied to multiple prefixes (EG "/.*") would be applied to all files.
>
> But I believe that the kerberos performance problem is not calling the regexes
> but loading.  The current code (unless it's changed recently) will compile all
> regexes, so when kerberos loads the file contexts for a check on /tmp then it
> will compile all regexes under /usr, /var, and other common prefixes even when
> they won't be used.  I don't know how much time can be saved by skipping the
> compile of those.
>
> Another thing that could be done is that we could have an interface for
> loading a file_contexts file for a specific prefix.  Then the code which generates
> the file_contexts file could generate files such as file_contexts_tmp which only
> has entries which match /tmp (10 for the policy I use, maybe 50 or so for the
> one you use) and which match everything (EG "/.*").  On my system there are 9
> file_contexts entries which are not prefix specific of which one is required
> ("/.*") and of the others /vmlinux.* and /initrd\.img.* are obsolete and the
> other 6 could be easily split to be prefix specific.
>
> So with a minor change to the library interface (adding a new entry point so
> the new library could work with old apps) we could have a program which knows
> that it will only label files under /tmp only checking 11 regexes on my system
> or maybe 50 on your system.

I have code that does just that.  Dan and I both wrote a version.
I'll attach it.  I didn't find the speedups we were hoping for and it
didn't work correctly/completely in the face of file context
equivalencies.  Although that is likely fixable.  I was just looking
at all of the stem code (and wondered who wrote it but it was
pre-git).  I'm surprised it made a big difference.  Wouldn't the regex
code be able to return extremely quickly if it didn't match?  Anyway.
I'm writing some test programs to look at all of the possibilities.

-Eric
Attachment:
1.patch

Description: Binary data
Attachment:
2.patch

Description: Binary data
Attachment:
3.patch

Description: Binary data