Re: [PATCH RFC] mm/madvise: introduce MADV_POPULATE to prefault/prealloc memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.02.21 12:04, Michal Hocko wrote:
On Fri 19-02-21 11:43:48, David Hildenbrand wrote:
On 19.02.21 11:35, Michal Hocko wrote:
On Wed 17-02-21 16:48:44, David Hildenbrand wrote:
[...]

I only got  to the implementation now.

+static long madvise_populate(struct vm_area_struct *vma,
+			     struct vm_area_struct **prev,
+			     unsigned long start, unsigned long end)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	unsigned long tmp_end;
+	int locked = 1;
+	long pages;
+
+	*prev = vma;
+
+	while (start < end) {
+		/*
+		 * We might have temporarily dropped the lock. For example,
+		 * our VMA might have been split.
+		 */
+		if (!vma || start >= vma->vm_end) {
+			vma = find_vma(mm, start);
+			if (!vma)
+				return -ENOMEM;
+		}

Why do you need to find a vma when you already have one. do_madvise will
give you your vma already. I do understand that you want to finish the
vma for some errors but that shouldn't require handling vmas. You should
be in the shope of one here unless I miss anything.

See below, we might temporary drop the lock while not having processed all
pages


+
+		/* Bail out on incompatible VMA types. */
+		if (vma->vm_flags & (VM_IO | VM_PFNMAP) ||
+		    !vma_is_accessible(vma)) {
+			return -EINVAL;
+		}
+
+		/*
+		 * Populate pages and take care of VM_LOCKED: simulate user
+		 * space access.
+		 *
+		 * For private, writable mappings, trigger a write fault to
+		 * break COW (i.e., shared zeropage). For other mappings (i.e.,
+		 * read-only, shared), trigger a read fault.
+		 */
+		tmp_end = min_t(unsigned long, end, vma->vm_end);
+		pages = populate_vma_page_range(vma, start, tmp_end, &locked);
+		if (!locked) {
+			mmap_read_lock(mm);
+			*prev = NULL;
+			vma = NULL;

^ here

so, the VMA might have been replaced/split/... in the meantime.

So to make forward progress, I have to lookup again. (similar. but different
to madvise_dontneed_free()).

Right. Missed that.

It would look more natural if we'd just be processing the whole range - but then it would not fit into the generic infrastructure and would result in even more code.

I decided to go with "process the passed range and treat the given VMA as an initial VMA that is invalidated as soon as we drop the lock".


--
Thanks,

David / dhildenb




[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux