Jes, Roman, et al. sorry for the wait. I have been able to test this further and the patch mentioned was indeed causing the issue. The easiest way for me to reproduce was by installing to a md device while it syncs. The interesting/telling part is during partitioning/formatting. Below is prior to reverting the patch. 19:58:39,249 INFO anaconda: Creating disklabel on /dev/sdb 19:58:39,312 INFO anaconda: Created disklabel on /dev/sdb 19:58:39,553 INFO anaconda: Creating mdmember on /dev/sdb1 19:58:39,616 INFO anaconda: Created mdmember on /dev/sdb1 19:58:40,529 INFO anaconda: Creating mdmember on /dev/sdb5 19:58:40,619 INFO anaconda: Created mdmember on /dev/sdb5 19:58:40,642 INFO anaconda: Creating mdmember on /dev/sdb3 19:58:40,708 INFO anaconda: Created mdmember on /dev/sdb3 19:58:40,731 INFO anaconda: Creating mdmember on /dev/sdb2 19:58:40,801 INFO anaconda: Created mdmember on /dev/sdb2 19:58:40,822 INFO anaconda: Creating disklabel on /dev/sda 19:58:40,890 INFO anaconda: Created disklabel on /dev/sda 19:58:41,118 INFO anaconda: Creating mdmember on /dev/sda1 19:58:41,179 INFO anaconda: Created mdmember on /dev/sda1 19:58:42,499 INFO anaconda: Creating lvmpv on /dev/md/2 19:59:45,248 INFO anaconda: Created lvmpv on /dev/md/2 20:00:44,268 INFO anaconda: Creating ext4 on /dev/mapper/vg00-mgtservices 20:01:15,815 INFO anaconda: Created ext4 on /dev/mapper/vg00-mgtservices 20:01:31,088 INFO anaconda: Creating ext4 on /dev/mapper/vg00-home 20:02:02,074 INFO anaconda: Created ext4 on /dev/mapper/vg00-home 20:02:27,638 INFO anaconda: Creating ext4 on /dev/mapper/vg00-var 20:03:04,709 INFO anaconda: Created ext4 on /dev/mapper/vg00-var 20:03:33,545 INFO anaconda: Creating ext4 on /dev/mapper/vg00-opt 20:04:17,431 INFO anaconda: Created ext4 on /dev/mapper/vg00-opt 20:04:47,419 INFO anaconda: Creating ext4 on /dev/mapper/vg00-tmp 20:05:29,563 INFO anaconda: Created ext4 on /dev/mapper/vg00-tmp 20:06:05,318 INFO anaconda: Creating ext4 on /dev/mapper/vg00-usr 20:06:51,398 INFO anaconda: Created ext4 on /dev/mapper/vg00-usr 20:07:25,466 INFO anaconda: Creating ext4 on /dev/mapper/vg00-log 20:08:10,052 INFO anaconda: Created ext4 on /dev/mapper/vg00-log 20:15:02,513 INFO anaconda: Creating mdmember on /dev/sda2 20:15:32,119 INFO anaconda: Created mdmember on /dev/sda2 20:15:57,297 INFO anaconda: Creating ext4 on /dev/md/1 20:16:37,913 INFO anaconda: Created ext4 on /dev/md/1 20:22:51,738 INFO anaconda: Creating mdmember on /dev/sda5 20:22:51,826 INFO anaconda: Created mdmember on /dev/sda5 20:22:55,472 INFO anaconda: Creating ext4 on /dev/md/0 20:22:56,837 INFO anaconda: Created ext4 on /dev/md/0 20:22:56,859 INFO anaconda: Creating mdmember on /dev/sda3 20:22:57,151 INFO anaconda: Created mdmember on /dev/sda3 20:23:04,052 INFO anaconda: Creating swap on /dev/md/3 20:23:30,488 INFO anaconda: Created swap on /dev/md/3 You can see the whole process takes over 20 minutes. Now the results after the patch revert, snipped for brevity. 16:45:17,990 INFO anaconda: Creating disklabel on /dev/sdb 16:45:18,040 INFO anaconda: Created disklabel on /dev/sdb .....snip 16:45:36,837 INFO anaconda: Creating swap on /dev/md/3 16:45:36,935 INFO anaconda: Created swap on /dev/md/3 Way down to 20 seconds... pretty severe impact. And as mentioned prior, the issue/impact extends on into package installation, app installation and standard app load. Is there a fix here that you guys can see? Or is this enough to decide on rolling back that patch? Please let me know if I can be of help. Best Regards, John Pittman Customer Engagement and Experience Red Hat Inc. On Fri, Mar 16, 2018 at 10:44 AM, John Pittman <jpittman@xxxxxxxxxx> wrote: > Jes, Roman, thanks for responding. > >>> Could you provide an actual example of how this is shown? > > Jes, the most recent high profile example we had was a customer executing > large scale kickstart installations to md devices. They found that install > was taking an exceptional amount of time on these systems, putting them > behind schedule. We noted fairly quickly that the issue could be worked > around with a kickstart script that drops the max_sync_speed during install. > We finally got them to accept the solution, but they were not happy about > it. Even with the kickstart workaround, after reboot, app install crawled > b/c the resync would continue at max speed. Another recent case is one > where the system would crawl to the point that it was difficult/impossible > to interact with the terminal; also resolved by decreasing max_sync_speed. > There are a great many more that come through support that I'm not involved > with. Xiao actually helped me with the install env case... it was rough. > >>> > I also don't think this belongs in userland. It makes a lot more sense >>> > to me to do this in the kernel setup of defaults for the device, which >>> > will also allow an admin to change the sysctl setting. > > I actually thought the same initially but decided the opposite. :) > Perfectly fine with > me though to add to kernel if accepted. Detecting rotational and setting > based on that > should be a small change from what I saw. > >>>I am also curious of the impact of reverting the patch Roman Mamedov >>> pointed out. > > I will do my best to provide results. > > On Fri, Mar 16, 2018 at 10:21 AM, Jes Sorensen <jes.sorensen@xxxxxxxxx> > wrote: >> >> On 03/16/2018 09:52 AM, John Pittman wrote: >> > Through numerous and consistent customer cases, it has been noted >> > that on systems with above average or high load, md devices backed >> > with rotational media cause a significant, system-wide I/O performance >> > impact during resync. This includes, but is not limited to, the >> > installation environment when root is on a md device. For all intents >> > and purposes, due to drastically increased seek operations, this >> > behavior is completely warranted and expected. However, it does cause >> > resync operations to only be truly feasible on low load systems or >> > during downtime. As devices grow larger, resyncs are taking longer, >> > requiring feasibility to extend into production uptime. So, taking >> > this into account, for rotational devices, 200000 is no longer a >> > reasonable limit. It's been found that when these performance issues >> > are encountered, in virtually all cases, the issue can be completely >> > resolved by setting a sync_speed_max value somewhere in between 50000 >> > and 100000, the lower it's set, the better the performance gets, as >> > expected. >> >> So I am not necessarily opposed to a change like this, however I find >> the "It's been found ....." wording here rather unconvincing. Could you >> provide an actual example of how this is shown? >> >> > Avoid these performance hits by persistently setting rotational devices, >> > via a udev rule, to a more reasonable value of 100000. The rule will >> > check if the rotational sysfs value equals 1, then check if a >> > local value has already been set. This local check should afford us >> > some >> > form of backward compatibility, preventing an override of already set, >> > per md device values. If both these criteria are matched, it will echo >> > the desired value into sysfs. One note is that this rule will override >> > the system-wide sysctl values, so if it's to be overridden, >> > the end user will have to create a new rule in /etc/udev/rules.d/ to >> > override or echo a new value in manually. >> >> Overriding system wide sysctl's behind the back of admins is >> unacceptable and not the right way to go. If an admin sets a sysctl >> value, that must be respected. >> >> I also don't think this belongs in userland. It makes a lot more sense >> to me to do this in the kernel setup of defaults for the device, which >> will also allow an admin to change the sysctl setting. >> >> I am also curious of the impact of reverting the patch Roman Mamedov >> pointed out. >> >> Thanks, >> Jes > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html