Hi Pavan,
It was pointed out to me that Red Hat has an advisory for the higher
level issue affecting XFS+LVM here:
https://access.redhat.com/solutions/3406851
As far as Ceph is concerned, the user who was having issues suffered
from a 3X performance regression when performing large-block sequential
reads to Bluestore OSDs. These were deployed via ceph-volume to LVM
backed by Intel NVMe devices. The kernel on the OSD nodes was an older
CentOS 3.10 kernel and did not have the patch applied. Writes did not
appear to be affected though mixed workloads and pure sequential reads
were. Using ceph-disk to deploy to raw partitions restored the
previously achieved performance. Unfortunately the user was unable to
upgrade the kernel on the nodes to verify if that fixed the issue so we
don't have direct proof it is related to this, though as far as we can
tell it looks very similar. Key observations included higher iowait,
higher device queue depths, higher client side latency, and (at least in
our case) far fewer read merges seen during a blktrace of the underlying
NVMe device.
Deep scrubs definitely might have an impact on your cluster, but I don't
think it's likely to be related to this issue unless you have your OSDs
on the Intel NVMe drives, are deploying via LVM, and are running an
older kernel prior to 3.10.0-891.el7. Most people are probably not
likely to hit this, but it's pretty rough if you do.
Thanks,
Mark
On 11/29/18 3:05 PM, Pavan Rallabhandi wrote:
Hi Mark,
I missed the meeting, I know the recording would be on the channel later, but was curious to know the details regarding the kernel bug that affects Intel NVMe drives.
The RH patch link in the pad doesn't seem to work for me, can you please provide more details on the issue, how it manifests with Ceph, symptoms, findings etc.
We are chasing a perf issue on the OpenStack VMs with cinder backed RBD volumes on Jewel (sporadic high disk util, high io waits on the mounted volumes in the VMs), that seem to go away by disabling deep scrubs! We are trying to validate that theory on our test clusters at this point. And we use Intel NVMe cards for our journals, hence the interest.
Thanks,
-Pavan.
On 11/29/18, 11:04 AM, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Mark Nelson" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of mnelson@xxxxxxxxxx> wrote:
Hi Folks,
Perf meeting at the usual 8AM PST time (ie right now!). Only agenda
item so far is a kernel bug found that affects Intel NVMe drives and
Ceph deployed on top of LVM.
Etherpad:
https://pad.ceph.com/p/performance_weekly
Bluejeans:
https://bluejeans.com/908675367
Thanks,
Mark