On Tue, Aug 08 2017, David R wrote: > Quoting Shaohua Li <shli@xxxxxxxxxx>: > >> Spent some time to check this one, unfortunately I can't find how that patch >> makes rcu stall. the percpu part looks good to me too. Can you >> double check if >> reverting 4ad23a976413aa57 makes the issue go away? When the rcu >> stall happens, >> what the /sys/block/md/md0/array_state? please also attach /proc/mdstat. When >> you say the mdx_raid1 threads are in 'R' state, can you double check if the >> /proc/pid/stack always 0xffffffffff? >> >> Thanks, >> Shaohua > > I confess to knowing absolutely nothing about the md code, so please > don't be too hard on me. However > :- > > static bool set_in_sync(struct mddev *mddev) > { > WARN_ON_ONCE(!spin_is_locked(&mddev->lock)); > if (!mddev->in_sync) { > mddev->sync_checkers++; > spin_unlock(&mddev->lock); > percpu_ref_switch_to_atomic_sync(&mddev->writes_pending); > spin_lock(&mddev->lock); > if (!mddev->in_sync && > percpu_ref_is_zero(&mddev->writes_pending)) { > mddev->in_sync = 1; > /* > * Ensure ->in_sync is visible before we clear > * ->sync_checkers. > */ > smp_mb(); > set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags); > sysfs_notify_dirent_safe(mddev->sysfs_state); > } > if (--mddev->sync_checkers == 0) > percpu_ref_switch_to_percpu(&mddev->writes_pending); > > > The switch_to_percpu() takes place under mddev->lock however > switch_to_atomic_sync() does not. A thread can be in the middle of (or > about to execute) switch_to_atomic_sync() at the same time as another > is calling switch_to_percpu(). This can't be correct surely? No, that wouldn't be correct. When switch_to_atomic_sync is called, ->sync_checkers > 0. When switch_to_percpu is called ->sync_checkers == 0. So they cannot happen at the same time. Thanks for looking at the code! NeilBrown > > Cheers > David
Attachment:
signature.asc
Description: PGP signature