Re: High OSD commit_latency after kernel upgrade

Özkan Göksu <ozkangksu@xxxxxxxxx> · Fri, 22 Mar 2024 17:59:49 +0300

Hello again.

In ceph recommendations I found this:

https://docs.ceph.com/en/quincy/start/hardware-recommendations/

WRITE CACHES
Enterprise SSDs and HDDs normally include power loss protection features which ensure data durability when power is lost while operating, and use multi-level caches to speed up direct or synchronous writes. These devices can be toggled between two caching modes – a volatile cache flushed to persistent media with fsync, or a non-volatile cache written synchronously.
These two modes are selected by either “enabling” or “disabling” the write (volatile) cache. When the volatile cache is enabled, Linux uses a device in “write back” mode, and when disabled, it uses “write through”.
The default configuration (usually: caching is enabled) may not be optimal, and OSD performance may be dramatically increased in terms of increased IOPS and decreased commit latency by disabling this write cache.
Users are therefore encouraged to benchmark their devices with fio as described earlier and persist the optimal cache configuration for their devices.

root@sd-02:~# cat /sys/class/scsi_disk/*/cache*
write back
write back
write back
write back
write back
write back
write back
write back
write back
write back

What do you think about these new udev rules? 
root@sd-02:~# cat /etc/udev/rules.d/98-ceph-provisioning-mode.rules
ACTION="" SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}:="unmap"

root@sd-02:~# cat /etc/udev/rules.d/99-ceph-write-through.rules
ACTION="" SUBSYSTEM=="scsi_disk", ATTR{cache_type}:="write through"

Özkan Göksu <ozkangksu@xxxxxxxxx>, 22 Mar 2024 Cum, 17:42 tarihinde şunu yazdı:
Hello Anthony, thank you for the answer. 
While researching I also found out this type of issues but the thing I did not understand is in the same server the OS drives "SAMSUNG MZ7WD480" is all good.

root@sd-01:~# lsblk -D
NAME                                           DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                                   0      512B       2G         0
├─sda1                                                0      512B       2G         0
├─sda2                                                0      512B       2G         0
└─sda3                                                0      512B       2G         0
  └─md0                                               0      512B       2G         0
    └─md0p1                                           0      512B       2G         0
sdb                                                   0      512B       2G         0
├─sdb1                                                0      512B       2G         0
├─sdb2                                                0      512B       2G         0
└─sdb3                                                0      512B       2G         0
  └─md0                                               0      512B       2G         0
    └─md0p1                                           0      512B       2G         0

root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:00/0000:00:11.4/ata1/host1/target1:0:0/1:0:0:0/scsi_disk/1:0:0:0/provisioning_mode:writesame_16
/sys/devices/pci0000:00/0000:00:11.4/ata2/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/scsi_disk/0:0:1:0/provisioning_mode:full

root@sd-01:~# disklist
HCTL       NAME       SIZE  REV TRAN   WWN                SERIAL      MODEL
1:0:0:0    /dev/sda 447.1G 203Q sata   0x5002538500231d05 S1G1NYAF923 SAMSUNG MZ7WD4
2:0:0:0    /dev/sdb 447.1G 203Q sata   0x5002538500231a41 S1G1NYAF922 SAMSUNG MZ7WD4
0:0:0:0    /dev/sdc   3.6T 046  sas    0x500a0751e6bd969b 2312E6BD969 CT4000MX500SSD
0:0:1:0    /dev/sdd   3.6T 046  sas    0x500a0751e6bd97ee 2312E6BD97E CT4000MX500SSD
0:0:2:0    /dev/sde   3.6T 046  sas    0x500a0751e6bd9805 2312E6BD980 CT4000MX500SSD
0:0:3:0    /dev/sdf   3.6T 046  sas    0x500a0751e6bd9681 2312E6BD968 CT4000MX500SSD
0:0:4:0    /dev/sdg   3.6T 045  sas    0x500a0751e6b5d30a 2309E6B5D30 CT4000MX500SSD
0:0:5:0    /dev/sdh   3.6T 046  sas    0x500a0751e6bd967e 2312E6BD967 CT4000MX500SSD
0:0:6:0    /dev/sdi   3.6T 046  sas    0x500a0751e6bd97e4 2312E6BD97E CT4000MX500SSD
0:0:7:0    /dev/sdj   3.6T 046  sas    0x500a0751e6bd96a0 2312E6BD96A CT4000MX500SSD

So my question is why it only happens to CT4000MX500SSD drives and why it just started now and I don't have in other servers? 
Maybe it is related to firmware version "M3CR046 vs M3CR045" 
I check the crucial website and actually "M3CR046" is not exist: https://www.crucial.com/support/ssd-support/mx500-support
In this forum people recommend upgrading "M3CR046" https://forums.unraid.net/topic/134954-warning-crucial-mx500-ssds-world-of-pain-stay-away-from-these/
But actually in my ud cluster all the drives are "M3CR045" and have lower latency. I'm really confused.

Instead of writing udev rules for only CT4000MX500SSD is there any recommended udev rule for ceph and all type of sata drives? 

Anthony D'Atri <aad@xxxxxxxxxxxxxx>, 22 Mar 2024 Cum, 17:00 tarihinde şunu yazdı:
How to stop sys from changing USB SSD provisioning_mode from unmap to full in Ubuntu 22.04?
askubuntu.com
?

On Mar 22, 2024, at 09:36, Özkan Göksu <ozkangksu@xxxxxxxxx> wrote:

Hello!

After upgrading "5.15.0-84-generic" to "5.15.0-100-generic" (Ubuntu 22.04.2
LTS) , commit latency started acting weird with "CT4000MX500SSD" drives.

osd  commit_latency(ms)  apply_latency(ms)
 36                 867                867
 37                3045               3045
 38                  15                 15
 39                  18                 18
 42                1409               1409
 43                1224               1224

I downgraded the kernel but the result did not change.
I have a similar build and it didn't get upgraded and it is just fine.
While I was digging I realised a difference.

This is high latency cluster and as you can see the "DISC-GRAN=0B",
"DISC-MAX=0B"
root@sd-01:~# lsblk -D
NAME                                           DISC-ALN DISC-GRAN DISC-MAX
DISC-ZERO
sdc                                                   0        0B       0B
        0
├─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--201d5050--db0c--41b4--85c4--6416ee989d6c
│                                                     0        0B       0B
        0
└─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--5a376133--47de--4e29--9b75--2314665c2862

root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full

------------------------------------------------------------------------------------------

This is low latency cluster and as you can see the "DISC-GRAN=4K",
"DISC-MAX=2G"
root@ud-01:~# lsblk -D
NAME                                                              DISC-ALN
DISC-GRAN DISC-MAX DISC-ZERO
sdc                                                                      0
       4K       2G         0
├─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--ec86a029--23f7--4328--9600--a24a290e3003
│                                                                        0
       4K       2G         0
└─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--5b69b748--d899--4f55--afc3--2ea3c8a05ca1

root@ud-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
/sys/devices/pci0000:00/0000:00:11.4/ata3/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16

I think the problem is related to provisioning_mode but I really did not
understand the reason.
I boot with a live iso and still the drive was "provisioning_mode:full" so
it means this is not related to my OS at all.

With the upgrade something changed and I think during boot sequence
negotiation between LSI controller, drives and kernel started to assign
"provisioning_mode:full" but I'm not sure.

What should I do ?

Best regards.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx