On Tue, Jun 20, 2023 at 10:56 PM Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > On Tue, Jun 20, 2023 at 03:29:34PM -0700, Jeff Xu wrote: > > On Wed, Jun 14, 2023 at 5:58 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > > > > > On Tue, Jun 13, 2023 at 09:18:14PM -0400, Liam R. Howlett wrote: > > > > * Jeff Xu <jeffxu@xxxxxxxxxxxx> [230613 17:29]: > > > > > Hello Peter, > > > > > > > > > > Thanks for responding. > > > > > > > > > > On Tue, Jun 13, 2023 at 1:16 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > > > > > > > > > > > Hi, Jeff, > > > > > > > > > > > > On Tue, Jun 13, 2023 at 08:26:26AM -0700, Jeff Xu wrote: > > > > > > > + more ppl to the list. > > > > > > > > > > > > > > On Mon, Jun 12, 2023 at 6:04 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > There seems to be inconsistency in different VMA fixup > > > > > > > > implementations, for example: > > > > > > > > mlock_fixup will skip VMA that is hugettlb, etc, but those checks do > > > > > > > > not exist in mprotect_fixup and madvise_update_vma. Wouldn't this be a > > > > > > > > problem? the merge/split skipped by mlock_fixup, might get acted on in > > > > > > > > the madvice/mprotect case. > > > > > > > > > > > > > > > > mlock_fixup currently check for > > > > > > > > if (newflags == oldflags || > > > > > > > > newflags == oldflags, then we don't need to do anything here, it's > > > > already at the desired mlock. mprotect does this, madvise does this.. > > > > probably.. it's ugly. > > > > > > > > > > > > (oldflags & VM_SPECIAL) || > > > > > > > > It's special, merging will fail always. I don't know about splitting, > > > > but I guess we don't want to alter the mlock state on special mappings. > > > > > > > > > > > > is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm) || > > > > > > > > vma_is_dax(vma) || vma_is_secretmem(vma)) > > > > > > > > > > > > The special handling you mentioned in mlock_fixup mostly makes sense to me. > > > > > > > > > > > > E.g., I think we can just ignore mlock a hugetlb page if it won't be > > > > > > swapped anyway. > > > > > > > > > > > > Do you encounter any issue with above? > > > > > > > > > > > > > > Should there be a common function to handle VMA merge/split ? > > > > > > > > > > > > IMHO vma_merge() and split_vma() are the "common functions". Copy Lorenzo > > > > > > as I think he has plan to look into the interface to make it even easier to > > > > > > use. > > > > > > > > > > > The mprotect_fixup doesn't have the same check as mlock_fixup. When > > > > > userspace calls mlock(), two VMAs might not merge or split because of > > > > > vma_is_secretmem check, However, when user space calls mprotect() with > > > > > the same address range, it will merge/split. If mlock() is doing the > > > > > right thing to merge/split the VMAs, then mprotect() is not ? > > > > > > > > It looks like secretmem is mlock'ed to begin with so they don't want it > > > > to be touched. So, I think they will be treated differently and I think > > > > it is correct. > > > > > > Right, they don't :) > > > > > > secretmem VMAs are always mlocked, they cannot be munlocked and there is no > > > point trying to mlock them again. > > > > > > The mprotect for secretmem is Ok though, so e.g. if we (unlikely) have two > > > adjacent secretmem VMAs in a range passed to mprotect, it's fine to merge > > > them. > > > > > > > I m thinking/brainstorming below, assuming: > > Address range 1: 0x5000 to 0x6000 (regular mmap) > > Address range 2: 0x6000 to 0x7000 (allocated to secretmem) > > Address range 3: 0x7000 to 0x8000 (regular mmap) > > > > User space call: mlock(0x5000,0x3000) > > range 1 and 2 won't merge. > > range 2 and 3 could merge, when mlock_fixup checks current vma > > (range 3), it is not secretmem, so it will merge with prev vma. > > But 2 and 3 have different vm_file, they won't merge. > > > user space call: mprotect(0x5000,0x3000) > > range 1 2 3 could merge, all three can have the same flags. > > Note: vma_is_secretmem() isn't checked in mprotect_fixup, same for > > vma_is_dax and get_gate_vma, those doesn't have included in > > vma->vm_flags > > > > Once 1 and 2 are merged, maybe user space is able to use > > munlock(0x5000,0x3000) > > to unlock range 1 to 3, this will include 2, right ? (haven't used the > > code to prove it) > > But 1 and 2 won't merge because their vm_file's are different. > Is that possible to be staged the same ? > > I'm using secretmem as an example here, having 3 different _fixup > > implementations seems to be error prone to me. > > The actual decision whether to merge VMAs is taken in vma_merge rather than > by the _fixup functions. So while the checks around vma_merge might be > different in these functions, it does not mean it's possible to wrongly > merge VMA unless there is a bug in vma_merge. So in the end it boils down > to a single core implementation, don't you agree? > I agree that vma_merge should also check, but it doesn't seem to be the case ? I looked for secretmem, get_gate_vma(current->mm), vma_is_dax() Ideally, the skip/go decisions should be inside vma_merge/vma_split() function, not in the _fixup(), I think. > > Thanks > > -Jeff > > -- > Sincerely yours, > Mike.