Re: High OSD commit_latency after kernel upgrade

Özkan Göksu <ozkangksu@xxxxxxxxx> · Fri, 22 Mar 2024 17:42:56 +0300

Hello Anthony, thank you for the answer. 
While researching I also found out this type of issues but the thing I did not understand is in the same server the OS drives "SAMSUNG MZ7WD480" is all good.

root@sd-01:~# lsblk -D
NAME                                           DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                                   0      512B       2G         0
├─sda1                                                0      512B       2G         0
├─sda2                                                0      512B       2G         0
└─sda3                                                0      512B       2G         0
  └─md0                                               0      512B       2G         0
    └─md0p1                                           0      512B       2G         0
sdb                                                   0      512B       2G         0
├─sdb1                                                0      512B       2G         0
├─sdb2                                                0      512B       2G         0
└─sdb3                                                0      512B       2G         0
  └─md0                                               0      512B       2G         0
    └─md0p1                                           0      512B       2G         0

root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:00/0000:00:11.4/ata1/host1/target1:0:0/1:0:0:0/scsi_disk/1:0:0:0/provisioning_mode:writesame_16
/sys/devices/pci0000:00/0000:00:11.4/ata2/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/scsi_disk/0:0:1:0/provisioning_mode:full

root@sd-01:~# disklist
HCTL       NAME       SIZE  REV TRAN   WWN                SERIAL      MODEL
1:0:0:0    /dev/sda 447.1G 203Q sata   0x5002538500231d05 S1G1NYAF923 SAMSUNG MZ7WD4
2:0:0:0    /dev/sdb 447.1G 203Q sata   0x5002538500231a41 S1G1NYAF922 SAMSUNG MZ7WD4
0:0:0:0    /dev/sdc   3.6T 046  sas    0x500a0751e6bd969b 2312E6BD969 CT4000MX500SSD
0:0:1:0    /dev/sdd   3.6T 046  sas    0x500a0751e6bd97ee 2312E6BD97E CT4000MX500SSD
0:0:2:0    /dev/sde   3.6T 046  sas    0x500a0751e6bd9805 2312E6BD980 CT4000MX500SSD
0:0:3:0    /dev/sdf   3.6T 046  sas    0x500a0751e6bd9681 2312E6BD968 CT4000MX500SSD
0:0:4:0    /dev/sdg   3.6T 045  sas    0x500a0751e6b5d30a 2309E6B5D30 CT4000MX500SSD
0:0:5:0    /dev/sdh   3.6T 046  sas    0x500a0751e6bd967e 2312E6BD967 CT4000MX500SSD
0:0:6:0    /dev/sdi   3.6T 046  sas    0x500a0751e6bd97e4 2312E6BD97E CT4000MX500SSD
0:0:7:0    /dev/sdj   3.6T 046  sas    0x500a0751e6bd96a0 2312E6BD96A CT4000MX500SSD

So my question is why it only happens to CT4000MX500SSD drives and why it just started now and I don't have in other servers? 
Maybe it is related to firmware version "M3CR046 vs M3CR045" 
I check the crucial website and actually "M3CR046" is not exist: https://www.crucial.com/support/ssd-support/mx500-support
In this forum people recommend upgrading "M3CR046" https://forums.unraid.net/topic/134954-warning-crucial-mx500-ssds-world-of-pain-stay-away-from-these/
But actually in my ud cluster all the drives are "M3CR045" and have lower latency. I'm really confused.

Instead of writing udev rules for only CT4000MX500SSD is there any recommended udev rule for ceph and all type of sata drives? 

Anthony D'Atri <aad@xxxxxxxxxxxxxx>, 22 Mar 2024 Cum, 17:00 tarihinde şunu yazdı:
How to stop sys from changing USB SSD provisioning_mode from unmap to full in Ubuntu 22.04?
askubuntu.com
?

On Mar 22, 2024, at 09:36, Özkan Göksu <ozkangksu@xxxxxxxxx> wrote:

Hello!

After upgrading "5.15.0-84-generic" to "5.15.0-100-generic" (Ubuntu 22.04.2
LTS) , commit latency started acting weird with "CT4000MX500SSD" drives.

osd  commit_latency(ms)  apply_latency(ms)
 36                 867                867
 37                3045               3045
 38                  15                 15
 39                  18                 18
 42                1409               1409
 43                1224               1224

I downgraded the kernel but the result did not change.
I have a similar build and it didn't get upgraded and it is just fine.
While I was digging I realised a difference.

This is high latency cluster and as you can see the "DISC-GRAN=0B",
"DISC-MAX=0B"
root@sd-01:~# lsblk -D
NAME                                           DISC-ALN DISC-GRAN DISC-MAX
DISC-ZERO
sdc                                                   0        0B       0B
        0
├─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--201d5050--db0c--41b4--85c4--6416ee989d6c
│                                                     0        0B       0B
        0
└─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--5a376133--47de--4e29--9b75--2314665c2862

root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full

------------------------------------------------------------------------------------------

This is low latency cluster and as you can see the "DISC-GRAN=4K",
"DISC-MAX=2G"
root@ud-01:~# lsblk -D
NAME                                                              DISC-ALN
DISC-GRAN DISC-MAX DISC-ZERO
sdc                                                                      0
       4K       2G         0
├─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--ec86a029--23f7--4328--9600--a24a290e3003
│                                                                        0
       4K       2G         0
└─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--5b69b748--d899--4f55--afc3--2ea3c8a05ca1

root@ud-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:00/0000:00:11.4/ata3/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16

I think the problem is related to provisioning_mode but I really did not
understand the reason.
I boot with a live iso and still the drive was "provisioning_mode:full" so
it means this is not related to my OS at all.

With the upgrade something changed and I think during boot sequence
negotiation between LSI controller, drives and kernel started to assign
"provisioning_mode:full" but I'm not sure.

What should I do ?

Best regards.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx