Re: [PATCH] libselinux: add support for pcre2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Fri, Sep 16, 2016 at 11:23 AM William Roberts <bill.c.roberts@xxxxxxxxx> wrote:
On Fri, Sep 16, 2016 at 3:13 AM, Janis Danisevskis <jdanis@xxxxxxxxxx> wrote:
> First of all, I would like to thank you, Stephen and William, for your
> patience and support.
>
> On Thu, Sep 15, 2016 at 8:34 PM William Roberts <bill.c.roberts@xxxxxxxxx>
> wrote:
>>
>> On Thu, Sep 15, 2016 at 7:57 AM, Stephen Smalley <sds@xxxxxxxxxxxxx>
>> wrote:
>> > On 09/15/2016 10:04 AM, Janis Danisevskis wrote:
>> >> From: Janis Danisevskis <jdanis@xxxxxxxxxx>
>> >>
>> >> This patch moves all pcre1/2 dependencies into the new files regex.h
>> >> and regex.c implementing the common denominator of features needed
>> >> by libselinux. The compiler flag -DUSE_PCRE2 toggles between the
>> >> used implementations.
>> >>
>> >> As of this patch libselinux supports either pcre or pcre2 but not
>> >> both at the same time. The persistently stored file contexts
>> >> information differs. This means libselinux can only load file
>> >> context files generated by sefcontext_compile build with the
>> >> same pcre variant.
>> >>
>> >> Also, for pcre2 the persistent format is architecture dependent.
>> >> Stored precompiled regular expressions can only be used on the
>> >> same architecture they were generated on. If pcre2 is used,
>> >> sefcontext_compile now respects the "-r". This flag makes
>> >> sefcontext_compile include the precompiled regular expressions
>> >> in the output file. The default is to omit them, so that the
>> >> output remains portable at the cost of having to recompile
>> >> the regular expressions at load time, or rather on first use.
>> >
>> > Is that really the default behavior you want?
>>
>> I would make the default include the pre-compiled regex's, if there
>> is an arch mismatch then ditch the pre-compiled ones and reload
>> them. I would make it an option to NOT generate pre-compiled
>> regexes. For the upstream consumers, who compile and deploy
>> on the same system, this means they don't need to update to add
>> -r. For android, we just need to update a makefile to include -r to
>> omit the precompiled regex's.
>
>
> The arch is currently not encoded in the binary, so a mismatch cannot be
> detected. I would sleep better if we left the -r option as is for now,
> because it is kind of "use only if you know what you are doing". Also, this
> way it does not break the android build process.

We probably would want to encode the architecture since this matters.This way
we  can detect a mismatch and compile them. This,prevents the
why isn't my device booting issue for cross compilations that don't have -r.

That's just the thing. Without -r the phone _will_ boot because the regexes are compiled on first use. With -r and an arch mismatch we have an undefined behavior, which is bad.

See, I don't currently know what part of the architecture is important. Will word size and endianess be enough, e.g., will regexes generated on x86_64el work on armv8el (apparently this is what I observed but it could change at the whim of the PCRE2 maintainers)? So what will be the encoding? If we encode the complete architecture then the typical case for android (build on x86_64 for armv7/armv8) will yield the same result as without the -r option, namely, we rebuild the regexes on the device. If we just encode word size and endianess it may work for a while... I just don't feel comfortable enough to make a decision at this point.

My limited understanding is that the automata built by PCRE2 use pointers to link states or other structures. These pointers are symbolized when serialized, but they keep their width, which is why the regexes are at least sensitive to the word size.

Just a wild Idea:
PCRE2 also supports JIT compiling. I did not use this here because I did not feel comfortable for libselinux to depend on the execmem permission. But this could be developed into pre cross compiling the regexes into native code, which can than be linked into a device specific library. But I guess this would require quite some cooperation with the PCRE2 developers.
 
The other thing is, this is only an issue on binary formated file_context files,
this is the ideal reason to move back to text files, since their is no speedup
gained if we have to compile them.

Right, or you could see it as an intermediate solution that does not require changes to the build system until we can properly cross compile the regexes.
 

>
>>
>>
>> In the case of Android, where we cross compile the file_contexts.bin,
>> does using binary fc's even make sense now? The major reason is that
>> the binary format loads faster by skipping the regex compilation, we
>> can no longer skip that with PCRE2 and cross compilation. We should
>> likely move Android back to text based if this is the case. I have
>> speedups in the neighborhood of 30% for text file processing in
>> the pipeline.
>
>
> I realize that the binary PCRE2 variant is currently much slower than the
> PCRE stored regexes. But it still a little faster than the text file
> processing. If you can make text file processing 30% faster, than this may
> well be an option because serializing pcre2 regexes will probably never beat
> pcre, even if we did stunts like compiling for the correct architecture
> using qemu.
> I'd be happy to help perform benchmarks and choose the best option.

Once you dump the pre-compiled regex's from the binary format,
your back to the text files performance. Also, with PCRE, the binary format was
3x the size on disk.

>
> Also I am exploring another option, which may sound a bit exotic. So I would
> like to have a running POC first before I come forward. Maybe nothing comes
> of it.
>
>>
>>
>> >
>> >> Signed-off-by: Janis Danisevskis <jdanis@xxxxxxxxxx>
>> >> ---
>> >
>> >> diff --git a/libselinux/src/label_file.h b/libselinux/src/label_file.h
>> >> index 6d1e890..3df7972 100644
>> >> --- a/libselinux/src/label_file.h
>> >> +++ b/libselinux/src/label_file.h
>> >> @@ -453,12 +429,14 @@ static inline int process_line(struct
>> >> selabel_handle *rec,
>> >>        */
>> >>       data->nspec++;
>> >>
>> >> -     if (rec->validating &&
>> >> -                         compile_regex(data, &spec_arr[nspec],
>> >> &errbuf)) {
>> >> +     if (rec->validating
>> >> +                     && compile_regex(data, &spec_arr[nspec],
>> >> &error_data)) {
>> >> +             regex_format_error(&error_data,
>> >> regex_error_format_buffer,
>> >> +                             sizeof(regex_error_format_buffer));
>> >>               COMPAT_LOG(SELINUX_ERROR,
>> >>                          "%s:  line %u has invalid regex %s:  %s\n",
>> >>                          path, lineno, regex,
>> >> -                        (errbuf ? errbuf : "out of memory"));
>> >> +                        regex_error_format_buffer);
>> >
>> > compile_regex() may fail on an out of memory condition before
>> > regex_error_format_buffer is initialized, which is why we previously
>> > passed errbuf ?: "out of memory" above.  I suppose you could initialize
>> > regex_error_format_buffer with "out of memory" prior to calling
>> > compile_regex().
>> >
>> >> diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
>> >> new file mode 100644
>> >> index 0000000..1c4a84d
>> >> --- /dev/null
>> >> +++ b/libselinux/src/regex.c
>> > <snip>
>> >> +int regex_writef(struct regex_data *regex, FILE *fp)
>> >
>> > This needs to be updated with the new do_write_precompregex argument,
>> > and either use the argument or mark it unused to permit compilation for
>> > the USE_PCRE2=n.
>> >
>> > _______________________________________________
>> > Seandroid-list mailing list
>> > Seandroid-list@xxxxxxxxxxxxx
>> > To unsubscribe, send email to Seandroid-list-leave@xxxxxxxxxxxxx.
>> > To get help, send an email containing "help" to
>> > Seandroid-list-request@xxxxxxxxxxxxx.
>>
>>
>>
>> --
>> Respectfully,
>>
>> William C Roberts
>> _______________________________________________
>> Selinux mailing list
>> Selinux@xxxxxxxxxxxxx
>> To unsubscribe, send email to Selinux-leave@xxxxxxxxxxxxx.
>> To get help, send an email containing "help" to
>> Selinux-request@xxxxxxxxxxxxx.



--
Respectfully,

William C Roberts
_______________________________________________
Selinux mailing list
Selinux@xxxxxxxxxxxxx
To unsubscribe, send email to Selinux-leave@xxxxxxxxxxxxx.
To get help, send an email containing "help" to Selinux-request@xxxxxxxxxxxxx.

[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux