On Mon, Jan 22, 2024 at 10:07 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Mon, Jan 22, 2024 at 9:36 PM SeongJae Park <sj@xxxxxxxxxx> wrote: > > > > Hi Suren, > > > > On Sun, 21 Jan 2024 23:13:24 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > > With maple_tree supporting vma tree traversal under RCU and per-vma locks > > > making vma access RCU-safe, /proc/pid/maps can be read under RCU and > > > without the need to read-lock mmap_lock. However vma content can change > > > from under us, therefore we make a copy of the vma and we pin pointer > > > fields used when generating the output (currently only vm_file and > > > anon_name). Afterwards we check for concurrent address space > > > modifications, wait for them to end and retry. That last check is needed > > > to avoid possibility of missing a vma during concurrent maple_tree > > > node replacement, which might report a NULL when a vma is replaced > > > with another one. While we take the mmap_lock for reading during such > > > contention, we do that momentarily only to record new mm_wr_seq counter. > > > This change is designed to reduce mmap_lock contention and prevent a > > > process reading /proc/pid/maps files (often a low priority task, such as > > > monitoring/data collection services) from blocking address space updates. > > > > > > Note that this change has a userspace visible disadvantage: it allows for > > > sub-page data tearing as opposed to the previous mechanism where data > > > tearing could happen only between pages of generated output data. > > > Since current userspace considers data tearing between pages to be > > > acceptable, we assume is will be able to handle sub-page data tearing > > > as well. > > > > > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > --- > > > fs/proc/internal.h | 2 + > > > fs/proc/task_mmu.c | 114 ++++++++++++++++++++++++++++++++++++++++++--- > > > 2 files changed, 109 insertions(+), 7 deletions(-) > > > > > > diff --git a/fs/proc/internal.h b/fs/proc/internal.h > > > index a71ac5379584..e0247225bb68 100644 > > > --- a/fs/proc/internal.h > > > +++ b/fs/proc/internal.h > > > @@ -290,6 +290,8 @@ struct proc_maps_private { > > > struct task_struct *task; > > > struct mm_struct *mm; > > > struct vma_iterator iter; > > > + unsigned long mm_wr_seq; > > > + struct vm_area_struct vma_copy; > > > #ifdef CONFIG_NUMA > > > struct mempolicy *task_mempolicy; > > > #endif > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > index 3f78ebbb795f..3886d04afc01 100644 > > > --- a/fs/proc/task_mmu.c > > > +++ b/fs/proc/task_mmu.c > > > @@ -126,11 +126,96 @@ static void release_task_mempolicy(struct proc_maps_private *priv) > > > } > > > #endif > > > > > > -static struct vm_area_struct *proc_get_vma(struct proc_maps_private *priv, > > > - loff_t *ppos) > > > +#ifdef CONFIG_PER_VMA_LOCK > > > + > > > +static const struct seq_operations proc_pid_maps_op; > > > +/* > > > + * Take VMA snapshot and pin vm_file and anon_name as they are used by > > > + * show_map_vma. > > > + */ > > > +static int get_vma_snapshow(struct proc_maps_private *priv, struct vm_area_struct *vma) > > > { > > > + struct vm_area_struct *copy = &priv->vma_copy; > > > + int ret = -EAGAIN; > > > + > > > + memcpy(copy, vma, sizeof(*vma)); > > > + if (copy->vm_file && !get_file_rcu(©->vm_file)) > > > + goto out; > > > + > > > + if (copy->anon_name && !anon_vma_name_get_rcu(copy)) > > > + goto put_file; > > > > From today updated mm-unstable which containing this patch, I'm getting below > > build error when CONFIG_ANON_VMA_NAME is not set. Seems this patch needs to > > handle the case? > > Hi SeongJae, > Thanks for reporting! I'll post an updated version fixing this config. Fix is posted at https://lore.kernel.org/all/20240123231014.3801041-3-surenb@xxxxxxxxxx/ as part of v2 of this patchset. Thanks, Suren. > Suren. > > > > > > .../linux/fs/proc/task_mmu.c: In function ‘get_vma_snapshow’: > > .../linux/fs/proc/task_mmu.c:145:19: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’? > > 145 | if (copy->anon_name && !anon_vma_name_get_rcu(copy)) > > | ^~~~~~~~~ > > | anon_vma > > .../linux/fs/proc/task_mmu.c:161:19: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’? > > 161 | if (copy->anon_name) > > | ^~~~~~~~~ > > | anon_vma > > .../linux/fs/proc/task_mmu.c:162:41: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’? > > 162 | anon_vma_name_put(copy->anon_name); > > | ^~~~~~~~~ > > | anon_vma > > .../linux/fs/proc/task_mmu.c: In function ‘put_vma_snapshot’: > > .../linux/fs/proc/task_mmu.c:174:18: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’? > > 174 | if (vma->anon_name) > > | ^~~~~~~~~ > > | anon_vma > > .../linux/fs/proc/task_mmu.c:175:40: error: ‘struct vm_area_struct’ has no member named ‘anon_name’; did you mean ‘anon_vma’? > > 175 | anon_vma_name_put(vma->anon_name); > > | ^~~~~~~~~ > > | anon_vma > > > > [...] > > > > > > Thanks, > > SJ