Re: [PATCH] mdadm: set persistent sync_speed_max based on rotational attribute

John Pittman <jpittman@xxxxxxxxxx> · Tue, 27 Mar 2018 14:49:46 -0400

Jes, Roman, et al. sorry for the wait.  I have been able to test this
further and the patch mentioned was indeed causing the issue.  The
easiest way for me to reproduce was by installing to a md device while
it syncs.  The interesting/telling part is during
partitioning/formatting.  Below is prior to reverting the patch.

  19:58:39,249 INFO anaconda: Creating disklabel on /dev/sdb
  19:58:39,312 INFO anaconda: Created disklabel on /dev/sdb
  19:58:39,553 INFO anaconda: Creating mdmember on /dev/sdb1
  19:58:39,616 INFO anaconda: Created mdmember on /dev/sdb1
  19:58:40,529 INFO anaconda: Creating mdmember on /dev/sdb5
  19:58:40,619 INFO anaconda: Created mdmember on /dev/sdb5
  19:58:40,642 INFO anaconda: Creating mdmember on /dev/sdb3
  19:58:40,708 INFO anaconda: Created mdmember on /dev/sdb3
  19:58:40,731 INFO anaconda: Creating mdmember on /dev/sdb2
  19:58:40,801 INFO anaconda: Created mdmember on /dev/sdb2
  19:58:40,822 INFO anaconda: Creating disklabel on /dev/sda
  19:58:40,890 INFO anaconda: Created disklabel on /dev/sda
  19:58:41,118 INFO anaconda: Creating mdmember on /dev/sda1
  19:58:41,179 INFO anaconda: Created mdmember on /dev/sda1
  19:58:42,499 INFO anaconda: Creating lvmpv on /dev/md/2
  19:59:45,248 INFO anaconda: Created lvmpv on /dev/md/2
  20:00:44,268 INFO anaconda: Creating ext4 on /dev/mapper/vg00-mgtservices
  20:01:15,815 INFO anaconda: Created ext4 on /dev/mapper/vg00-mgtservices
  20:01:31,088 INFO anaconda: Creating ext4 on /dev/mapper/vg00-home
  20:02:02,074 INFO anaconda: Created ext4 on /dev/mapper/vg00-home
  20:02:27,638 INFO anaconda: Creating ext4 on /dev/mapper/vg00-var
  20:03:04,709 INFO anaconda: Created ext4 on /dev/mapper/vg00-var
  20:03:33,545 INFO anaconda: Creating ext4 on /dev/mapper/vg00-opt
  20:04:17,431 INFO anaconda: Created ext4 on /dev/mapper/vg00-opt
  20:04:47,419 INFO anaconda: Creating ext4 on /dev/mapper/vg00-tmp
  20:05:29,563 INFO anaconda: Created ext4 on /dev/mapper/vg00-tmp
  20:06:05,318 INFO anaconda: Creating ext4 on /dev/mapper/vg00-usr
  20:06:51,398 INFO anaconda: Created ext4 on /dev/mapper/vg00-usr
  20:07:25,466 INFO anaconda: Creating ext4 on /dev/mapper/vg00-log
  20:08:10,052 INFO anaconda: Created ext4 on /dev/mapper/vg00-log
  20:15:02,513 INFO anaconda: Creating mdmember on /dev/sda2
  20:15:32,119 INFO anaconda: Created mdmember on /dev/sda2
  20:15:57,297 INFO anaconda: Creating ext4 on /dev/md/1
  20:16:37,913 INFO anaconda: Created ext4 on /dev/md/1
  20:22:51,738 INFO anaconda: Creating mdmember on /dev/sda5
  20:22:51,826 INFO anaconda: Created mdmember on /dev/sda5
  20:22:55,472 INFO anaconda: Creating ext4 on /dev/md/0
  20:22:56,837 INFO anaconda: Created ext4 on /dev/md/0
  20:22:56,859 INFO anaconda: Creating mdmember on /dev/sda3
  20:22:57,151 INFO anaconda: Created mdmember on /dev/sda3
  20:23:04,052 INFO anaconda: Creating swap on /dev/md/3
  20:23:30,488 INFO anaconda: Created swap on /dev/md/3

You can see the whole process takes over 20 minutes.  Now the results
after the patch revert, snipped for brevity.

  16:45:17,990 INFO anaconda: Creating disklabel on /dev/sdb
  16:45:18,040 INFO anaconda: Created disklabel on /dev/sdb
  .....snip
  16:45:36,837 INFO anaconda: Creating swap on /dev/md/3
  16:45:36,935 INFO anaconda: Created swap on /dev/md/3

Way down to 20 seconds... pretty severe impact.  And as mentioned
prior, the issue/impact extends on into package installation, app
installation and standard app load.
Is there a fix here that you guys can see?  Or is this enough to
decide on rolling back that patch?
Please let me know if I can be of help.

Best Regards,

John Pittman
Customer Engagement and Experience
Red Hat Inc.

On Fri, Mar 16, 2018 at 10:44 AM, John Pittman <jpittman@xxxxxxxxxx> wrote:
> Jes, Roman, thanks for responding.
>
>>> Could you provide an actual example of how this is shown?
>
> Jes, the most recent high profile example we had was a customer executing
> large scale kickstart installations to md devices.  They found that install
> was taking an exceptional amount of time on these systems, putting them
> behind schedule.  We noted fairly quickly that the issue could be worked
> around with a kickstart script that drops the max_sync_speed during install.
> We finally got them to accept the solution, but they were not happy about
> it.  Even with the kickstart workaround, after reboot, app install crawled
> b/c the resync would continue at max speed.  Another recent case is one
> where the system would crawl to the point that it was difficult/impossible
> to interact with the terminal; also resolved by decreasing max_sync_speed.
> There are a great many more that come through support that I'm not involved
> with.  Xiao actually helped me with the install env case... it was rough.
>
>>>
> I also don't think this belongs in userland. It makes a lot more sense
>>>
> to me to do this in the kernel setup of defaults for the device, which
>>>
> will also allow an admin to change the sysctl setting.
>
> I actually thought the same initially but decided the opposite. :)
> Perfectly fine with
> me though to add to kernel if accepted.  Detecting rotational and setting
> based on that
> should be a small change from what I saw.
>
>>>I am also curious of the impact of reverting the patch Roman Mamedov
>>> pointed out.
>
> I will do my best to provide results.
>
> On Fri, Mar 16, 2018 at 10:21 AM, Jes Sorensen <jes.sorensen@xxxxxxxxx>
> wrote:
>>
>> On 03/16/2018 09:52 AM, John Pittman wrote:
>> > Through numerous and consistent customer cases, it has been noted
>> > that on systems with above average or high load, md devices backed
>> > with rotational media cause a significant, system-wide I/O performance
>> > impact during resync.  This includes, but is not limited to, the
>> > installation environment when root is on a md device.  For all intents
>> > and purposes, due to drastically increased seek operations, this
>> > behavior is completely warranted and expected.  However, it does cause
>> > resync operations to only be truly feasible on low load systems or
>> > during downtime.  As devices grow larger, resyncs are taking longer,
>> > requiring feasibility to extend into production uptime.  So, taking
>> > this into account, for rotational devices, 200000 is no longer a
>> > reasonable limit.  It's been found that when these performance issues
>> > are encountered, in virtually all cases, the issue can be completely
>> > resolved by setting a sync_speed_max value somewhere in between 50000
>> > and 100000, the lower it's set, the better the performance gets, as
>> > expected.
>>
>> So I am not necessarily opposed to a change like this, however I find
>> the "It's been found ....." wording here rather unconvincing. Could you
>> provide an actual example of how this is shown?
>>
>> > Avoid these performance hits by persistently setting rotational devices,
>> > via a udev rule, to a more reasonable value of 100000.  The rule will
>> > check if the rotational sysfs value equals 1, then check if a
>> > local value has already been set.  This local check should afford us
>> > some
>> > form of backward compatibility, preventing an override of already set,
>> > per md device values.  If both these criteria are matched, it will echo
>> > the desired value into sysfs.  One note is that this rule will override
>> > the system-wide sysctl values, so if it's to be overridden,
>> > the end user will have to create a new rule in /etc/udev/rules.d/ to
>> > override or echo a new value in manually.
>>
>> Overriding system wide sysctl's behind the back of admins is
>> unacceptable and not the right way to go. If an admin sets a sysctl
>> value, that must be respected.
>>
>> I also don't think this belongs in userland. It makes a lot more sense
>> to me to do this in the kernel setup of defaults for the device, which
>> will also allow an admin to change the sysctl setting.
>>
>> I am also curious of the impact of reverting the patch Roman Mamedov
>> pointed out.
>>
>> Thanks,
>> Jes
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html