On Thu, Jan 28, 2016 at 02:10:38PM +1100, Neil Brown wrote: > On Wed, Jan 27 2016, Chien Lee wrote: > > > 2016-01-27 6:12 GMT+08:00 NeilBrown <neilb@xxxxxxxx>: > >> On Tue, Jan 26 2016, Chien Lee wrote: > >> > >>> Hello, > >>> > >>> Recently we find a bug about this patch (commit No. is > >>> ac8fa4196d205ac8fff3f8932bddbad4f16e4110 ). > >>> > >>> We know that this patch committed after Linux kernel 4.1.x is intended > >>> to allowing resync to go faster when there is competing IO. However, > >>> we find the performance of random read on syncing Raid6 will come up > >>> with a huge drop in this case. The following is our testing detail. > >>> > >>> The OS what we choose in our test is CentOS Linux release 7.1.1503 > >>> (Core) and the kernel image will be replaced for testing. In our > >>> testing result, the 4K random read performance on syncing raid6 in > >>> Kernel 4.2.8 is much lower than in Kernel 3.19.8. In order to find out > >>> the root cause, we try to rollback this patch in Kernel 4.2.8, and we > >>> find the 4K random read performance on syncing Raid6 will be improved > >>> and go back to as what it should be in Kernel 3.19.8. > >>> > >>> Nevertheless, it seems that it will not affect some other read/write > >>> patterns. In our testing result, the 1M sequential read/write, 4K > >>> random write performance in Kernel 4.2.8 is performed almost the same > >>> as in Kernel 3.19.8. > >>> > >>> It seems that although this patch increases the resync speed, the > >>> logic of !is_mddev_idle() cause the sync request wait too short and > >>> reduce the chance for raid5d to handle the random read I/O. > >> > >> This has been raised before. > >> Can you please try the patch at the end of > >> > >> http://permalink.gmane.org/gmane.linux.raid/51002 > >> > >> and let me know if it makes any difference. If it isn't sufficient I > >> will explore further. > >> > >> Thanks, > >> NeilBrown > > > > > > Hello Neil, > > > > I try the patch (http://permalink.gmane.org/gmane.linux.raid/51002) in > > Kernel 4.2.8. Here are the test results: > > > > > > Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing)) > > > > a. 4K Random Read, numjobs=64 > > > > Average Throughput Average IOPS > > > > Kernel 4.2.8 Patch 601249KB/s 150312 > > > > > > b. 4K Random Read, numjobs=1 > > > > Average Throughput Average IOPS > > > > Kernel 4.2.8 Patch 1166.4KB/s 291 > > > > > > > > Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing)) > > > > a. 4K Random Read, numjobs=64 > > > > Average Throughput Average IOPS > > > > Kernel 4.2.8 Patch 2946.4KB/s 736 > > > > > > b. 4K Random Read, numjobs=1 > > > > Average Throughput Average IOPS > > > > Kernel 4.2.8 Patch 119199 B/s 28 > > > > > > Although the performance that compare to the original Kernel 4.2.8 > > test results is increased, the patch > > (http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ac8fa4196d205ac8fff3f8932bddbad4f16e4110) > > rollback still has the best performance. I also observe the sync speed > > at numjobs=64 almost drop to the sync_speed_min, but sync speed at > > numjobs=1 almost keep in the original speed. > > > >>From my test results, I think this patch isn't sufficient that maybe > > Neil can explore further and give me some advice. > > > > > > Thanks, > > Chien Lee > > > > > >>> > >>> > >>> Following is our test environment and some testing results: > >>> > >>> > >>> OS: CentOS Linux release 7.1.1503 (Core) > >>> > >>> CPU: Intel(R) Xeon(R) CPU E3-1245 v3 @ 3.40GHz > >>> > >>> Processor number: 8 > >>> > >>> Memory: 12GB > >>> > >>> fio command: > >>> > >>> 1. (for numjobs=64): > >>> > >>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K > >>> --runtime=180 --size=50G --name=test-read --ioengine=libaio > >>> --numjobs=64 --iodepth=1 --group_reporting > >>> > >>> 2. (for numjobs=1): > >>> > >>> fio --filename=/dev/md2 --sync=0 --direct=0 --rw=randread --bs=4K > >>> --runtime=180 --size=50G --name=test-read --ioengine=libaio > >>> --numjobs=1 --iodepth=1 --group_reporting > >>> > >>> > >>> > >>> Here are test results: > >>> > >>> > >>> Part I. SSD (4 x 240GB Intel SSD create Raid6(syncing)) > >>> > >>> > >>> a. 4K Random Read, numjobs=64 > >>> > >>> Average Throughput Average IOPS > >>> > >>> Kernel 3.19.8 715937KB/s 178984 > >>> > >>> Kernel 4.2.8 489874KB/s 122462 > >>> > >>> Kernel 4.2.8 Patch Rollback 717377KB/s 179344 > >>> > >>> > >>> > >>> b. 4K Random Read, numjobs=1 > >>> > >>> Average Throughput Average IOPS > >>> > >>> Kernel 3.19.8 32203KB/s 8051 > >>> > >>> Kernel 4.2.8 2535.7KB/s 633 > >>> > >>> Kernel 4.2.8 Patch Rollback 31861KB/s 7965 > >>> > >>> > >>> > >>> > >>> Part II. HDD (4 x 1TB TOSHIBA HDD create Raid6(syncing)) > >>> > >>> > >>> a. 4K Random Read, numjobs=64 > >>> > >>> Average Throughput Average IOPS > >>> > >>> Kernel 3.19.8 2976.6KB/s 744 > >>> > >>> Kernel 4.2.8 2915.8KB/s 728 > >>> > >>> Kernel 4.2.8 Patch Rollback 2973.3KB/s 743 > >>> > >>> > >>> > >>> b. 4K Random Read, numjobs=1 > >>> > >>> Average Throughput Average IOPS > >>> > >>> Kernel 3.19.8 481844 B/s 117 > >>> > >>> Kernel 4.2.8 24718 B/s 5 > >>> > >>> Kernel 4.2.8 Patch Rollback 460090 B/s 112 > >>> > >>> > >>> > >>> Thanks, > >>> > >>> -- > >>> > >>> Chien Lee > > Thanks for testing. > > I'd like to suggest that these results are fairly reasonable for the > numjobs=64 case. Certainly read-speed is reduced by presumably resync > speed is increased. > The numbers for numjob=1 are appalling though. That would generally > affect any synchronous load. As the synchronous load doesn't interfere > much with the resync load, the delays that are inserted won't be very > long. > > I feel there must be an answer here - I just cannot find it. > I'd like to be able to dynamically estimate the bandwidth of the array > and use (say) 10% of that, but I cannot think of a way to do that at all > reliably. Had a hack, something like this? diff --git a/drivers/md/md.c b/drivers/md/md.c index e55e6cf..7fee8e6 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8060,12 +8060,34 @@ void md_do_sync(struct md_thread *thread) goto repeat; } if (!is_mddev_idle(mddev, 0)) { + unsigned long start = jiffies; + int recov = atomic_read(&mddev->recovery_active); + int last_sect, new_sect; + int sleep_time = 0; + + last_sect = (int)part_stat_read(&mddev->gendisk->part0, sectors[0]) + + (int)part_stat_read(&mddev->gendisk->part0, sectors[1]); + /* * Give other IO more of a chance. * The faster the devices, the less we wait. */ wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active)); + + new_sect = (int)part_stat_read(&mddev->gendisk->part0, sectors[0]) + + (int)part_stat_read(&mddev->gendisk->part0, sectors[1]); + + if (recov * 10 > new_sect - last_sect) + sleep_time = 9 * (jiffies - start) / + ((new_sect - last_sect) / + (recov + 1) + 1); + + sleep_time = jiffies_to_msecs(sleep_time); + if (sleep_time > 500) + sleep_time = 500; + + msleep(sleep_time); } } } -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html