On Thu, Jan 25, 2024 at 2:44 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Thu, Jan 25, 2024 at 01:27:34PM -0800, Suren Baghdasaryan wrote: > > On Thu, Jan 25, 2024 at 2:23 AM kernel test robot <lkp@xxxxxxxxx> wrote: > > > > > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master > > > head: 01af33cc9894b4489fb68fa35c40e9fe85df63dc > > > commit: 0c30c4cd953025979b7689e49844837f762303ec [1589/1892] mm/maps: read proc/pid/maps under RCU > > > config: x86_64-randconfig-121-20240125 (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@xxxxxxxxx/config) > > > compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240125/202401251829.0m6Eo4LI-lkp@xxxxxxxxx/reproduce) > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > > the same patch/commit), kindly add following tags > > > | Reported-by: kernel test robot <lkp@xxxxxxxxx> > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202401251829.0m6Eo4LI-lkp@xxxxxxxxx/ > > > > > > sparse warnings: (new ones prefixed by >>) > > > >> fs/proc/task_mmu.c:143:45: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct file [noderef] __rcu **f @@ got struct file ** @@ > > > > Uh, this is a problem. > > I missed that get_file_rcu() is used only with mm->exe_file and > > vma->vm_file is not really RCU-safe. It's freed via a call to fput() > > which schedules its freeing using schedule_delayed_work(..., 1) but I > > don't think that constitutes RCU grace period. Paul, Matthew, could > > you please confirm? In the meantime I'm going to ask Andrew to remove > > my patchset from mm-unstable to be safe. > > Sadly, no, schedule_delayed_work() does not imply an RCU grace period. > > There is a queue_rcu_work() that schedules work after a grace period, > which could be combined with a timer to get the delay. > > Another approach would be to use get_state_synchronize_rcu() before > the schedule_delayed_work() in fput(), then do cond_synchronize_rcu() > in delayed_fput(). This would require adding an unsigned long to > struct file to keep track of which grace period a given struct file > needed to wait for. > > Perhaps something like this: > > ------------------------------------------------------------------------ > > void fput(struct file *file) > { > if (atomic_long_dec_and_test(&file->f_count)) { > struct task_struct *task = current; > > if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { > init_task_work(&file->f_rcuhead, ____fput); > if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME)) > return; > /* > * After this task has run exit_task_work(), > * task_work_add() will fail. Fall through to delayed > * fput to avoid leaking *file. > */ > } > > file->f_rcu_seq = get_state_synchronize_rcu(); > if (llist_add(&file->f_llist, &delayed_fput_list)) > schedule_delayed_work(&delayed_fput_work, 1); > } > } > > ------------------------------------------------------------------------ > > And this: > > ------------------------------------------------------------------------ > > static void delayed_fput(struct work_struct *unused) > { > struct llist_node *node = llist_del_all(&delayed_fput_list); > struct file *f, *t; > > llist_for_each_entry_safe(f, t, node, f_llist) { > cond_synchronize_rcu(f->f_rcu_seq); > __fput(f); > } > } > > ------------------------------------------------------------------------ > > Note that if you called fput() on a long sequence of struct file > structures, the cond_synchronize_rcu() would be a near-noop almost all the > time, actually blocking at most about every once per every few jiffies. > After all, once a grace period has been waited for, it covers all of > the struct file structures that were passed to fput() during a given > RCU grace period. > > Still, it would add the occasional delay. And it would increase the > size of struct file, though there are workarounds for that, if size > is an issue. > > Thoughts? Thanks for the suggestion, Paul. I'm worried about this occasional delay but otherwise this seems like a nice and simple approach. Do you guys think that making *all* files RCU-safe with this approach is warranted? For my particular case I need only vma->vm_file to be RCU-safe but maybe there are other cases which would benefit from this? > > Thanx, Paul > > > > fs/proc/task_mmu.c:143:45: sparse: expected struct file [noderef] __rcu **f > > > fs/proc/task_mmu.c:143:45: sparse: got struct file ** > > > fs/proc/task_mmu.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...): > > > include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'get_vma_snapshot' - unexpected unlock > > > fs/proc/task_mmu.c:264:22: sparse: sparse: context imbalance in 'm_start' - different lock contexts for basic block > > > include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'm_stop' - unexpected unlock > > > include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'smaps_pte_range' - unexpected unlock > > > include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'clear_refs_pte_range' - unexpected unlock > > > include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_pmd_range' - unexpected unlock > > > include/linux/rcupdate.h:781:9: sparse: sparse: context imbalance in 'pagemap_scan_pmd_entry' - unexpected unlock > > > fs/proc/task_mmu.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...): > > > arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression > > > arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression > > > > > > vim +143 fs/proc/task_mmu.c > > > > > > 132 > > > 133 /* > > > 134 * Take VMA snapshot and pin vm_file and anon_name as they are used by > > > 135 * show_map_vma. > > > 136 */ > > > 137 static int get_vma_snapshot(struct proc_maps_private *priv, struct vm_area_struct *vma) > > > 138 { > > > 139 struct vm_area_struct *copy = &priv->vma_copy; > > > 140 int ret = -EAGAIN; > > > 141 > > > 142 memcpy(copy, vma, sizeof(*vma)); > > > > 143 if (copy->vm_file && !get_file_rcu(©->vm_file)) > > > 144 goto out; > > > 145 > > > 146 if (!anon_vma_name_get_if_valid(copy)) > > > 147 goto put_file; > > > 148 > > > 149 if (priv->mm_wr_seq == mmap_write_seq_read(priv->mm)) > > > 150 return 0; > > > 151 > > > 152 /* Address space got modified, vma might be stale. Wait and retry. */ > > > 153 rcu_read_unlock(); > > > 154 ret = mmap_read_lock_killable(priv->mm); > > > 155 mmap_write_seq_record(priv->mm, &priv->mm_wr_seq); > > > 156 mmap_read_unlock(priv->mm); > > > 157 rcu_read_lock(); > > > 158 > > > 159 if (!ret) > > > 160 ret = -EAGAIN; /* no other errors, ok to retry */ > > > 161 > > > 162 anon_vma_name_put_if_valid(copy); > > > 163 put_file: > > > 164 if (copy->vm_file) > > > 165 fput(copy->vm_file); > > > 166 out: > > > 167 return ret; > > > 168 } > > > 169 > > > > > > -- > > > 0-DAY CI Kernel Test Service > > > https://github.com/intel/lkp-tests/wiki