Re: 11/29/2018 perf meeting is on!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Pavan,

It was pointed out to me that Red Hat has an advisory for the higher level issue affecting XFS+LVM here:


https://access.redhat.com/solutions/3406851


As far as Ceph is concerned, the user who was having issues suffered from a 3X performance regression when performing large-block sequential reads to Bluestore OSDs.  These were deployed via ceph-volume to LVM backed by Intel NVMe devices.  The kernel on the OSD nodes was an older CentOS 3.10 kernel and did not have the patch applied.  Writes did not appear to be affected though mixed workloads and pure sequential reads were.  Using ceph-disk to deploy to raw partitions restored the previously achieved performance.  Unfortunately the user was unable to upgrade the kernel on the nodes to verify if that fixed the issue so we don't have direct proof it is related to this, though as far as we can tell it looks very similar.  Key observations included higher iowait, higher device queue depths, higher client side latency, and (at least in our case) far fewer read merges seen during a blktrace of the underlying NVMe device.

Deep scrubs definitely might have an impact on your cluster, but I don't think it's likely to be related to this issue unless you have your OSDs on the Intel NVMe drives, are deploying via LVM, and are running an older kernel prior to 3.10.0-891.el7.  Most people are probably not likely to hit this, but it's pretty rough if you do.


Thanks,

Mark


On 11/29/18 3:05 PM, Pavan Rallabhandi wrote:
Hi Mark,

I missed the meeting, I know the recording would be on the channel later, but was curious to know the details regarding the kernel bug that affects Intel NVMe drives.

The RH patch link in the pad doesn't seem to work for me, can you please provide more details on the issue, how it manifests with Ceph, symptoms, findings etc.

We are chasing a perf issue on the OpenStack VMs with cinder backed RBD volumes on Jewel (sporadic high disk util, high io waits on the mounted volumes in the VMs), that seem to go away by disabling deep scrubs!  We are trying to validate that theory on our test clusters at this point. And we use Intel NVMe cards for our journals, hence the interest.

Thanks,
-Pavan.

On 11/29/18, 11:04 AM, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Mark Nelson" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of mnelson@xxxxxxxxxx> wrote:

     Hi Folks,
Perf meeting at the usual 8AM PST time (ie right now!). Only agenda
     item so far is a kernel bug found that affects Intel NVMe drives and
     Ceph deployed on top of LVM.
Etherpad: https://pad.ceph.com/p/performance_weekly Bluejeans: https://bluejeans.com/908675367 Thanks, Mark



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux