Ignore me. The increment and decrement of sync_checkers should protect
switch_to_percpu(). Sigh.
Quoting David R <david@xxxxxxxxxxxxxxx>:
Quoting Shaohua Li <shli@xxxxxxxxxx>:
Spent some time to check this one, unfortunately I can't find how that patch
makes rcu stall. the percpu part looks good to me too. Can you
double check if
reverting 4ad23a976413aa57 makes the issue go away? When the rcu
stall happens,
what the /sys/block/md/md0/array_state? please also attach
/proc/mdstat. When
you say the mdx_raid1 threads are in 'R' state, can you double check if the
/proc/pid/stack always 0xffffffffff?
Thanks,
Shaohua
I confess to knowing absolutely nothing about the md code, so please
don't be too hard on me. However
:-
static bool set_in_sync(struct mddev *mddev)
{
WARN_ON_ONCE(!spin_is_locked(&mddev->lock));
if (!mddev->in_sync) {
mddev->sync_checkers++;
spin_unlock(&mddev->lock);
percpu_ref_switch_to_atomic_sync(&mddev->writes_pending);
spin_lock(&mddev->lock);
if (!mddev->in_sync &&
percpu_ref_is_zero(&mddev->writes_pending)) {
mddev->in_sync = 1;
/*
* Ensure ->in_sync is visible before we clear
* ->sync_checkers.
*/
smp_mb();
set_bit(MD_SB_CHANGE_CLEAN, &mddev->sb_flags);
sysfs_notify_dirent_safe(mddev->sysfs_state);
}
if (--mddev->sync_checkers == 0)
percpu_ref_switch_to_percpu(&mddev->writes_pending);
The switch_to_percpu() takes place under mddev->lock however
switch_to_atomic_sync() does not. A thread can be in the middle of
(or about to execute) switch_to_atomic_sync() at the same time as
another is calling switch_to_percpu(). This can't be correct surely?
Cheers
David
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html