On Tue, Aug 14, 2012 at 7:18 AM, Russell Coker <russell@xxxxxxxxxxxx> wrote: > On Tue, 14 Aug 2012, Colin Walters <walters@xxxxxxxxxx> wrote: >> Really though in the big picture, while the file context regexps were >> probably an OK solution way back when SELinux was a "proof of concept" >> prototype, the current policy generating 5000 of them is just crazy... > > Actually the situation is way better than it was in the early days. > > When I first started working on SE Linux the software wasn't as optimised and > the hardware was way slower. A restorecon type operation would be 99% user > CPU time, taking more than 20 minutes of CPU time for relabelling a relatively > small filesystem was common. > > Having 5000 on a modern for argument sake (it's 1923 on my system, but that > depends on whether you load a policy with everything or just the modules you > need) is a lot easier than the situation in the early days with fewer regular > expressions. > >> One other possibility - I bet one could get a huge speedup in some cases >> by splitting up the regexp set based on common prefixes. For example, >> if you're trying to match /tmp/krb5cc, there's no reason to run over all >> 2000 regexps which start with /usr. This solution is kind of an >> intermediate step between "run 5000 regexps serially" and "write custom >> code to compile 5000 regexps into a DFA that returns a context". > > Yes, I wrote code to do that many years ago. Any regex which had a fixed > string for the first subdirectory from root would only be called for a filename > which was in the same subdirectory. The prefixes were indexed so an integer > compare would be used to determine whether a regex would be called. Regexes > which applied to multiple prefixes (EG "/.*") would be applied to all files. > > But I believe that the kerberos performance problem is not calling the regexes > but loading. The current code (unless it's changed recently) will compile all > regexes, so when kerberos loads the file contexts for a check on /tmp then it > will compile all regexes under /usr, /var, and other common prefixes even when > they won't be used. I don't know how much time can be saved by skipping the > compile of those. > > Another thing that could be done is that we could have an interface for > loading a file_contexts file for a specific prefix. Then the code which generates > the file_contexts file could generate files such as file_contexts_tmp which only > has entries which match /tmp (10 for the policy I use, maybe 50 or so for the > one you use) and which match everything (EG "/.*"). On my system there are 9 > file_contexts entries which are not prefix specific of which one is required > ("/.*") and of the others /vmlinux.* and /initrd\.img.* are obsolete and the > other 6 could be easily split to be prefix specific. > > So with a minor change to the library interface (adding a new entry point so > the new library could work with old apps) we could have a program which knows > that it will only label files under /tmp only checking 11 regexes on my system > or maybe 50 on your system. I have code that does just that. Dan and I both wrote a version. I'll attach it. I didn't find the speedups we were hoping for and it didn't work correctly/completely in the face of file context equivalencies. Although that is likely fixable. I was just looking at all of the stem code (and wondered who wrote it but it was pre-git). I'm surprised it made a big difference. Wouldn't the regex code be able to return extremely quickly if it didn't match? Anyway. I'm writing some test programs to look at all of the possibilities. -Eric
Attachment:
1.patch
Description: Binary data
Attachment:
2.patch
Description: Binary data
Attachment:
3.patch
Description: Binary data