Re: [PATCH v10 0/5] Introduce mseal

Jeff Xu <jeffxu@xxxxxxxxxxxx> · Thu, 18 Apr 2024 18:22:16 -0700

On Thu, Apr 18, 2024 at 1:19 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Tue, Apr 16, 2024 at 12:40 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote:
> >
> > On Tue, Apr 16, 2024 at 8:13 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote:
> > >
> > > * jeffxu@xxxxxxxxxxxx <jeffxu@xxxxxxxxxxxx> [240415 12:35]:
> > > > From: Jeff Xu <jeffxu@xxxxxxxxxxxx>
> > > >
> > > > This is V10 version, it rebases v9 patch to 6.9.rc3.
> > > > We also applied and tested mseal() in chrome and chromebook.
> > > >
> > > > ------------------------------------------------------------------
> > > ...
> > >
> > > > MM perf benchmarks
> > > > ==================
> > > > This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
> > > > check the VMAs’ sealing flag, so that no partial update can be made,
> > > > when any segment within the given memory range is sealed.
> > > >
> > > > To measure the performance impact of this loop, two tests are developed.
> > > > [8]
> > > >
> > > > The first is measuring the time taken for a particular system call,
> > > > by using clock_gettime(CLOCK_MONOTONIC). The second is using
> > > > PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
> > > > similar results.
> > > >
> > > > The tests have roughly below sequence:
> > > > for (i = 0; i < 1000, i++)
> > > >     create 1000 mappings (1 page per VMA)
> > > >     start the sampling
> > > >     for (j = 0; j < 1000, j++)
> > > >         mprotect one mapping
> > > >     stop and save the sample
> > > >     delete 1000 mappings
> > > > calculates all samples.
> > >
> > >
> > > Thank you for doing this performance testing.
> > >
> > > >
> > > > Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
> > > > 4G memory, Chromebook.
> > > >
> > > > Based on the latest upstream code:
> > > > The first test (measuring time)
> > > > syscall__     vmas    t       t_mseal delta_ns        per_vma %
> > > > munmap__      1       909     944     35      35      104%
> > > > munmap__      2       1398    1502    104     52      107%
> > > > munmap__      4       2444    2594    149     37      106%
> > > > munmap__      8       4029    4323    293     37      107%
> > > > munmap__      16      6647    6935    288     18      104%
> > > > munmap__      32      11811   12398   587     18      105%
> > > > mprotect      1       439     465     26      26      106%
> > > > mprotect      2       1659    1745    86      43      105%
> > > > mprotect      4       3747    3889    142     36      104%
> > > > mprotect      8       6755    6969    215     27      103%
> > > > mprotect      16      13748   14144   396     25      103%
> > > > mprotect      32      27827   28969   1142    36      104%
> > > > madvise_      1       240     262     22      22      109%
> > > > madvise_      2       366     442     76      38      121%
> > > > madvise_      4       623     751     128     32      121%
> > > > madvise_      8       1110    1324    215     27      119%
> > > > madvise_      16      2127    2451    324     20      115%
> > > > madvise_      32      4109    4642    534     17      113%
> > > >
> > > > The second test (measuring cpu cycle)
> > > > syscall__     vmas    cpu     cmseal  delta_cpu       per_vma %
> > > > munmap__      1       1790    1890    100     100     106%
> > > > munmap__      2       2819    3033    214     107     108%
> > > > munmap__      4       4959    5271    312     78      106%
> > > > munmap__      8       8262    8745    483     60      106%
> > > > munmap__      16      13099   14116   1017    64      108%
> > > > munmap__      32      23221   24785   1565    49      107%
> > > > mprotect      1       906     967     62      62      107%
> > > > mprotect      2       3019    3203    184     92      106%
> > > > mprotect      4       6149    6569    420     105     107%
> > > > mprotect      8       9978    10524   545     68      105%
> > > > mprotect      16      20448   21427   979     61      105%
> > > > mprotect      32      40972   42935   1963    61      105%
> > > > madvise_      1       434     497     63      63      115%
> > > > madvise_      2       752     899     147     74      120%
> > > > madvise_      4       1313    1513    200     50      115%
> > > > madvise_      8       2271    2627    356     44      116%
> > > > madvise_      16      4312    4883    571     36      113%
> > > > madvise_      32      8376    9319    943     29      111%
> > > >
> > >
> > > If I am reading this right, madvise() is affected more than the other
> > > calls?  Is that expected or do we need to have a closer look?
> > >
> > The madvise() has a bigger percentage (per_vma %), but it also has a
> > smaller base value (cpu).
>
> Sorry, it's unclear to me what the "vmas" column denotes. Is that how
> many VMAs were created before timing the syscall? If so, then 32 is
> the max that you show here while you seem to have tested with 1000
> VMAs. What is the overhead with 1000 VMAs?

The vmas column is the number of VMA used in one call.

For example: for 32 and mprotect(ptr,size), the memory range used in
mprotect has 32 VMAs.

It also matters how many memory ranges are in-use at the time of the
test, This is where 1000 comes in. The test creates 1000 memory
ranges, each memory range has 32 vmas, then calls mprotect on the 1000
memory range. (the pseudocode was included in the original email)

> My worry is that if the overhead grows linearly with the number of
> VMAs then the effects will be quite noticeable on Android where an
> application with a few thousand VMAs is not so unusual.
>
The overhead is likely to grow linearly with the number of VMA, since
it takes time to retrieve VMA's metadata.

Let's use one data sample to look at impact:

Test: munmap 1000 memory range, each memory range has 1 VMA

syscall__       vmas    t       t_mseal delta_ns        per_vma %
munmap__        1       909     944     35      35      104%

For those 1000 munmap calls, sealing adds 35000 ns in total, or 35 ns per call.

The delta seems to be insignificant. e.g. it will take about 28571
munmap call to have 1 ms difference (1000000/35=28571)

When I look at the data from 5.10 to 6.8, for the same munmap call,
6.8 adds 552 ns per call, which is 15 times bigger.

syscall__       vmas    t_5_10  t_6_8   delta_ns        per_vma %
munmap__        1       357     909     552     552     254%

> >
> > -Jeff