Re: raid6check extremely slow ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/11/20 6:14 PM, Piergiorgio Sartor wrote:
On Mon, May 11, 2020 at 10:58:07AM +0200, Guoqing Jiang wrote:
Hi Wolfgang,


On 5/11/20 8:40 AM, Wolfgang Denk wrote:
Dear Guoqing Jiang,

In message<2cf55e5f-bdfb-9fef-6255-151e049ac0a1@xxxxxxxxxxxxxxx>  you wrote:
Seems raid6check is in 'D' state, what are the output of 'cat
/proc/19719/stack' and /proc/mdstat?
# for i in 1 2 3 4 ; do  cat /proc/19719/stack; sleep 2; echo ; done
[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_lo_store+0x50/0xa0
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_lo_store+0x50/0xa0
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_hi_store+0x44/0x90
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[<0>] __wait_rcu_gp+0x10d/0x110
[<0>] synchronize_rcu+0x47/0x50
[<0>] mddev_suspend+0x4a/0x140
[<0>] suspend_hi_store+0x44/0x90
[<0>] md_attr_store+0x86/0xe0
[<0>] kernfs_fop_write+0xce/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x4f/0xc0
[<0>] do_syscall_64+0x5b/0xf0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Looks raid6check keeps writing suspend_lo/hi node which causes mddev_suspend
is called,
means synchronize_rcu and other synchronize mechanisms are triggered in the
path ...

Interesting, why is it in ksys_write / vfs_write / kernfs_fop_write
all the time?  I thought it was_reading_  the disks only?
I didn't read raid6check before, just find check_stripes has


     while (length > 0) {
             lock_stripe -> write suspend_lo/hi node
             ...
             unlock_all_stripes -> -> write suspend_lo/hi node
     }

I think it explains the stack of raid6check, and maybe it is way that
raid6check works, lock
stripe, check the stripe then unlock the stripe, just my guess ...
Hi again!

I made a quick test.
I disabled the lock / unlock in raid6check.

With lock / unlock, I get around 1.2MB/sec
per device component, with ~13% CPU load.
Wihtout lock / unlock, I get around 15.5MB/sec
per device component, with ~30% CPU load.

So, it seems the lock / unlock mechanism is
quite expensive.

Yes, since mddev_suspend/resume are triggered by the lock/unlock stripe.

I'm not sure what's the best solution, since
we still need to avoid race conditions.

I guess there are two possible ways:

1. Per your previous reply, only call raid6check when array is RO, then
we don't need the lock.

2. Investigate if it is possible that acquire stripe_lock in suspend_lo/hi_store
to avoid the race between raid6check and write to the same stripe. IOW,
try fine grained protection instead of call the expensive suspend/resume
in suspend_lo/hi_store. But I am not sure it is doable or not right now.


BTW, seems there are build problems for raid6check ...

mdadm$ make raid6check
gcc -Wall -Werror -Wstrict-prototypes -Wextra -Wno-unused-parameter -Wimplicit-fallthrough=0 -O2 -DSendmail=\""/usr/sbin/sendmail -t"\" -DCONFFILE=\"/etc/mdadm.conf\" -DCONFFILE2=\"/etc/mdadm/mdadm.conf\" -DMAP_DIR=\"/run/mdadm\" -DMAP_FILE=\"map\" -DMDMON_DIR=\"/run/mdadm\" -DFAILED_SLOTS_DIR=\"/run/mdadm/failed-slots\" -DNO_COROSYNC -DNO_DLM -DVERSION=\"4.1-74-g5cfb79d\" -DVERS_DATE="\"2020-04-27\"" -DUSE_PTHREADS -DBINDIR=\"/sbin\"  -o sysfs.o -c sysfs.c gcc -O2  -o raid6check raid6check.o restripe.o sysfs.o maps.o lib.o xmalloc.o dlink.o
sysfs.o: In function `sysfsline':
sysfs.c:(.text+0x2adb): undefined reference to `parse_uuid'
sysfs.c:(.text+0x2aee): undefined reference to `uuid_zero'
sysfs.c:(.text+0x2af5): undefined reference to `uuid_zero'
collect2: error: ld returned 1 exit status
Makefile:220: recipe for target 'raid6check' failed
make: *** [raid6check] Error 1


Thanks,
Guoqing



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux