On Fri 19-02-21 11:43:48, David Hildenbrand wrote: > On 19.02.21 11:35, Michal Hocko wrote: > > On Wed 17-02-21 16:48:44, David Hildenbrand wrote: > > [...] > > > > I only got to the implementation now. > > > > > +static long madvise_populate(struct vm_area_struct *vma, > > > + struct vm_area_struct **prev, > > > + unsigned long start, unsigned long end) > > > +{ > > > + struct mm_struct *mm = vma->vm_mm; > > > + unsigned long tmp_end; > > > + int locked = 1; > > > + long pages; > > > + > > > + *prev = vma; > > > + > > > + while (start < end) { > > > + /* > > > + * We might have temporarily dropped the lock. For example, > > > + * our VMA might have been split. > > > + */ > > > + if (!vma || start >= vma->vm_end) { > > > + vma = find_vma(mm, start); > > > + if (!vma) > > > + return -ENOMEM; > > > + } > > > > Why do you need to find a vma when you already have one. do_madvise will > > give you your vma already. I do understand that you want to finish the > > vma for some errors but that shouldn't require handling vmas. You should > > be in the shope of one here unless I miss anything. > > See below, we might temporary drop the lock while not having processed all > pages > > > > > > + > > > + /* Bail out on incompatible VMA types. */ > > > + if (vma->vm_flags & (VM_IO | VM_PFNMAP) || > > > + !vma_is_accessible(vma)) { > > > + return -EINVAL; > > > + } > > > + > > > + /* > > > + * Populate pages and take care of VM_LOCKED: simulate user > > > + * space access. > > > + * > > > + * For private, writable mappings, trigger a write fault to > > > + * break COW (i.e., shared zeropage). For other mappings (i.e., > > > + * read-only, shared), trigger a read fault. > > > + */ > > > + tmp_end = min_t(unsigned long, end, vma->vm_end); > > > + pages = populate_vma_page_range(vma, start, tmp_end, &locked); > > > + if (!locked) { > > > + mmap_read_lock(mm); > > > + *prev = NULL; > > > + vma = NULL; > > ^ here > > so, the VMA might have been replaced/split/... in the meantime. > > So to make forward progress, I have to lookup again. (similar. but different > to madvise_dontneed_free()). Right. Missed that. -- Michal Hocko SUSE Labs