Re: [PATCH] libselinux: Workaround for heap overhead of pcre

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 21, 2022 at 12:02 AM Christian Göttsche
<cgzones@xxxxxxxxxxxxxx> wrote:
>
> On Mon, 19 Dec 2022 at 09:53, Inseob Kim <inseob@xxxxxxxxxx> wrote:
> >
> > pcre's behavior is changed so that pcre2_match always allocates heap for
> > match_data, rather than stack, regardless of size. The heap isn't freed
> > until explicitly calling pcre2_match_data_free. This new behavior may
> > result in heap overhead, which may increase the peak memory usage about
> > a few megabytes. It's because regex_match is first called for regex_data
> > objects, and then regex_data objects are freed at once.
>
> This approach trades peak memory usage for temporary allocations,
> which effects runtime performance.  On modern systems memory is most
> of the time not a scarce resource.
>
> Some examples:
>
> # selabel_lookup -b file -k /etc/shadow -t file [heaptrack]
>
> ## current
>
> total runtime: 0.07s.
> calls to allocation functions: 28420 (406000/s)
> temporary memory allocations: 16 (228/s)
> peak heap memory consumption: 10.09M
> peak RSS (including heaptrack overhead): 21.27M
> total memory leaked: 1.02K
>
> ## proposed
>
> total runtime: 0.06s.
> calls to allocation functions: 23430 (366093/s)
> temporary memory allocations: 675 (10546/s)
> peak heap memory consumption: 9.48M
> peak RSS (including heaptrack overhead): 18.59M
> total memory leaked: 1.02K
>
> # restorecon -vRn /etc [heaptrack]
>
> ## current
>
> total runtime: 0.14s.
> calls to allocation functions: 33873 (236874/s)
> temporary memory allocations: 1877 (13125/s)
> peak heap memory consumption: 10.09M
> peak RSS (including heaptrack overhead): 21.58M
> total memory leaked: 1.90K
>
> ## proposed
>
> total runtime: 0.27s.
> calls to allocation functions: 378762 (1423917/s)
> temporary memory allocations: 351487 (1321379/s)
> peak heap memory consumption: 9.48M
> peak RSS (including heaptrack overhead): 20.99M
> total memory leaked: 1.90K
>
>
> # restorecon -vRn /usr [hyperfine]
>
> ## current
>
> restorecon -vRn /usr
> Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
>   Time (mean ± σ):     24.419 s ±  0.661 s    [User: 23.480 s, System: 0.922 s]
>   Range (min … max):   23.399 s … 25.495 s    10 runs
>
> ## proposed
>
> restorecon -vRn /usr
> Benchmark 1: ~/destdir/sbin/restorecon -vRn /usr
>   Time (mean ± σ):     28.628 s ±  0.968 s    [User: 27.688 s, System: 0.927 s]
>   Range (min … max):   27.674 s … 30.798 s    10 runs
>
>
> So I would argue the performance impact for applications (like
> setfiles, restorecon) or daemon (like systemd, udev) is more critical
> than the 500K per application.

I observed about 3~4MB increase on Android device. Which pcre2 version
are you using? Does it include
https://github.com/PCRE2Project/pcre2/commit/d90fb238#diff-15ec3f4ed916f52c810daf305702985dda6d8d45e7ce22e2f309c95bd6ef32b7R74
?

And if this is difficult to apply, how about adding a new flag e.g.
AGGRESSIVE_FREE_AFTER_REGEX_MATCH ?

>
> > To workaround it, free and reallocate match_data whenever we call
> > regex_match. It's fine because libselinux currently doesn't use
> > match_data, but use only the return value.
> >
> > Signed-off-by: Inseob Kim <inseob@xxxxxxxxxx>
> > ---
> >  libselinux/src/regex.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/libselinux/src/regex.c b/libselinux/src/regex.c
> > index 149a7973..2df282f1 100644
> > --- a/libselinux/src/regex.c
> > +++ b/libselinux/src/regex.c
> > @@ -213,10 +213,20 @@ void regex_data_free(struct regex_data *regex)
> >  int regex_match(struct regex_data *regex, char const *subject, int partial)
> >  {
> >         int rc;
> > +       pcre2_match_data *new_match_data;
> >         __pthread_mutex_lock(&regex->match_mutex);
> > +       new_match_data = pcre2_match_data_create_from_pattern(
> > +           regex->regex, NULL);
>
> Should be checked for failure (cause pcre2_match() expects a non-NULL
> match_data, which would be passed the second time).
>
> Also with this change the member match_data of the struct regex_data
> becomes obsolete and should be removed.

Thanks, this makes sense.

>
> >         rc = pcre2_match(
> >             regex->regex, (PCRE2_SPTR)subject, PCRE2_ZERO_TERMINATED, 0,
> >             partial ? PCRE2_PARTIAL_SOFT : 0, regex->match_data, NULL);
> > +       // pcre2_match allocates heap and it won't be freed until
> > +       // pcre2_match_data_free, resulting in heap overhead.
> > +       // Reallocate match_data to prevent such overhead, whenever possible.
> > +       if (new_match_data) {
> > +               pcre2_match_data_free(regex->match_data);
> > +               regex->match_data = new_match_data;
> > +       }
> >         __pthread_mutex_unlock(&regex->match_mutex);
> >         if (rc > 0)
> >                 return REGEX_MATCH;
> > --
> > 2.39.0.314.g84b9a713c41-goog
> >



-- 
Inseob Kim | Software Engineer | inseob@xxxxxxxxxx




[Index of Archives]     [Selinux Refpolicy]     [Linux SGX]     [Fedora Users]     [Fedora Desktop]     [Yosemite Photos]     [Yosemite Camping]     [Yosemite Campsites]     [KDE Users]     [Gnome Users]

  Powered by Linux