On Fri, Apr 19, 2024 at 3:15 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > On Fri, Apr 19, 2024 at 7:57 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > On Thu, Apr 18, 2024 at 6:22 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > On Thu, Apr 18, 2024 at 1:19 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > > > > > On Tue, Apr 16, 2024 at 12:40 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > > > > > On Tue, Apr 16, 2024 at 8:13 AM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > > > > > > > > > > > * jeffxu@xxxxxxxxxxxx <jeffxu@xxxxxxxxxxxx> [240415 12:35]: > > > > > > > From: Jeff Xu <jeffxu@xxxxxxxxxxxx> > > > > > > > > > > > > > > This is V10 version, it rebases v9 patch to 6.9.rc3. > > > > > > > We also applied and tested mseal() in chrome and chromebook. > > > > > > > > > > > > > > ------------------------------------------------------------------ > > > > > > ... > > > > > > > > > > > > > MM perf benchmarks > > > > > > > ================== > > > > > > > This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to > > > > > > > check the VMAs’ sealing flag, so that no partial update can be made, > > > > > > > when any segment within the given memory range is sealed. > > > > > > > > > > > > > > To measure the performance impact of this loop, two tests are developed. > > > > > > > [8] > > > > > > > > > > > > > > The first is measuring the time taken for a particular system call, > > > > > > > by using clock_gettime(CLOCK_MONOTONIC). The second is using > > > > > > > PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have > > > > > > > similar results. > > > > > > > > > > > > > > The tests have roughly below sequence: > > > > > > > for (i = 0; i < 1000, i++) > > > > > > > create 1000 mappings (1 page per VMA) > > > > > > > start the sampling > > > > > > > for (j = 0; j < 1000, j++) > > > > > > > mprotect one mapping > > > > > > > stop and save the sample > > > > > > > delete 1000 mappings > > > > > > > calculates all samples. > > > > > > > > > > > > > > > > > > Thank you for doing this performance testing. > > > > > > > > > > > > > > > > > > > > Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz, > > > > > > > 4G memory, Chromebook. > > > > > > > > > > > > > > Based on the latest upstream code: > > > > > > > The first test (measuring time) > > > > > > > syscall__ vmas t t_mseal delta_ns per_vma % > > > > > > > munmap__ 1 909 944 35 35 104% > > > > > > > munmap__ 2 1398 1502 104 52 107% > > > > > > > munmap__ 4 2444 2594 149 37 106% > > > > > > > munmap__ 8 4029 4323 293 37 107% > > > > > > > munmap__ 16 6647 6935 288 18 104% > > > > > > > munmap__ 32 11811 12398 587 18 105% > > > > > > > mprotect 1 439 465 26 26 106% > > > > > > > mprotect 2 1659 1745 86 43 105% > > > > > > > mprotect 4 3747 3889 142 36 104% > > > > > > > mprotect 8 6755 6969 215 27 103% > > > > > > > mprotect 16 13748 14144 396 25 103% > > > > > > > mprotect 32 27827 28969 1142 36 104% > > > > > > > madvise_ 1 240 262 22 22 109% > > > > > > > madvise_ 2 366 442 76 38 121% > > > > > > > madvise_ 4 623 751 128 32 121% > > > > > > > madvise_ 8 1110 1324 215 27 119% > > > > > > > madvise_ 16 2127 2451 324 20 115% > > > > > > > madvise_ 32 4109 4642 534 17 113% > > > > > > > > > > > > > > The second test (measuring cpu cycle) > > > > > > > syscall__ vmas cpu cmseal delta_cpu per_vma % > > > > > > > munmap__ 1 1790 1890 100 100 106% > > > > > > > munmap__ 2 2819 3033 214 107 108% > > > > > > > munmap__ 4 4959 5271 312 78 106% > > > > > > > munmap__ 8 8262 8745 483 60 106% > > > > > > > munmap__ 16 13099 14116 1017 64 108% > > > > > > > munmap__ 32 23221 24785 1565 49 107% > > > > > > > mprotect 1 906 967 62 62 107% > > > > > > > mprotect 2 3019 3203 184 92 106% > > > > > > > mprotect 4 6149 6569 420 105 107% > > > > > > > mprotect 8 9978 10524 545 68 105% > > > > > > > mprotect 16 20448 21427 979 61 105% > > > > > > > mprotect 32 40972 42935 1963 61 105% > > > > > > > madvise_ 1 434 497 63 63 115% > > > > > > > madvise_ 2 752 899 147 74 120% > > > > > > > madvise_ 4 1313 1513 200 50 115% > > > > > > > madvise_ 8 2271 2627 356 44 116% > > > > > > > madvise_ 16 4312 4883 571 36 113% > > > > > > > madvise_ 32 8376 9319 943 29 111% > > > > > > > > > > > > > > > > > > > If I am reading this right, madvise() is affected more than the other > > > > > > calls? Is that expected or do we need to have a closer look? > > > > > > > > > > > The madvise() has a bigger percentage (per_vma %), but it also has a > > > > > smaller base value (cpu). > > > > > > > > Sorry, it's unclear to me what the "vmas" column denotes. Is that how > > > > many VMAs were created before timing the syscall? If so, then 32 is > > > > the max that you show here while you seem to have tested with 1000 > > > > VMAs. What is the overhead with 1000 VMAs? > > > > > > The vmas column is the number of VMA used in one call. > > > > > > For example: for 32 and mprotect(ptr,size), the memory range used in > > > mprotect has 32 VMAs. > > > > Ok, so the 32 here denotes how many VMAs one mprotect() call spans? > > > Yes. > > > > > > > It also matters how many memory ranges are in-use at the time of the > > > test, This is where 1000 comes in. The test creates 1000 memory > > > ranges, each memory range has 32 vmas, then calls mprotect on the 1000 > > > memory range. (the pseudocode was included in the original email) > > > > So, if each range has 32 vmas and you have 1000 ranges then you are > > creating 32000 vmas? Sorry, your pseudocode does not clarify that. My > > current understanding is this: > > > > for (i = 0; i < 1000, i++) > > mmap N*1000 areas (N=[1-32]) > > start the sampling > > for (j = 0; j < 1000, j++) > > mprotect N areas with one syscall > > stop and save the sample > > munmap N*1000 areas > > calculates all samples. > > > > Is that correct? > > > Yes, There will be 32000 VMA in the system. > > The pseudocode is correct in concept. > The test implementation is slightly different, it uses mprotect to > split the memory and make sure the VMAs doesn't merge. For detail, > the reference [8] of the original email link to the test code. Ok, thanks for clarifications. I don't think the overhead is high enough to worry about. Thanks, Suren. > > -Jeff