Re: High OSD commit_latency after kernel upgrade

"Anthony D'Atri" <aad@xxxxxxxxxxxxxx> · Fri, 22 Mar 2024 11:11:37 -0400

Maybe because the Crucial units are detected as client drives?  But also look at the device paths and the output of whatever "disklist" is.  Your boot drives are SATA and the others are SAS which seems even more likely to be a factor.

> On Mar 22, 2024, at 10:42, Özkan Göksu <ozkangksu@xxxxxxxxx> wrote:
> 
> Hello Anthony, thank you for the answer. 
> 
> While researching I also found out this type of issues but the thing I did not understand is in the same server the OS drives "SAMSUNG MZ7WD480" is all good.
> 
> root@sd-01:~# lsblk -D
> NAME                                           DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> sda                                                   0      512B       2G         0
> ├─sda1                                                0      512B       2G         0
> ├─sda2                                                0      512B       2G         0
> └─sda3                                                0      512B       2G         0
>   └─md0                                               0      512B       2G         0
>     └─md0p1                                           0      512B       2G         0
> sdb                                                   0      512B       2G         0
> ├─sdb1                                                0      512B       2G         0
> ├─sdb2                                                0      512B       2G         0
> └─sdb3                                                0      512B       2G         0
>   └─md0                                               0      512B       2G         0
>     └─md0p1                                           0      512B       2G         0
> 
> root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
> /sys/devices/pci0000:00/0000:00:11.4/ata1/host1/target1:0:0/1:0:0:0/scsi_disk/1:0:0:0/provisioning_mode:writesame_16
> /sys/devices/pci0000:00/0000:00:11.4/ata2/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16
> /sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full
> /sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/scsi_disk/0:0:1:0/provisioning_mode:full
> 
> root@sd-01:~# disklist
> HCTL       NAME       SIZE  REV TRAN   WWN                SERIAL      MODEL
> 1:0:0:0    /dev/sda 447.1G 203Q sata   0x5002538500231d05 S1G1NYAF923 SAMSUNG MZ7WD4
> 2:0:0:0    /dev/sdb 447.1G 203Q sata   0x5002538500231a41 S1G1NYAF922 SAMSUNG MZ7WD4
> 0:0:0:0    /dev/sdc   3.6T 046  sas    0x500a0751e6bd969b 2312E6BD969 CT4000MX500SSD
> 0:0:1:0    /dev/sdd   3.6T 046  sas    0x500a0751e6bd97ee 2312E6BD97E CT4000MX500SSD
> 0:0:2:0    /dev/sde   3.6T 046  sas    0x500a0751e6bd9805 2312E6BD980 CT4000MX500SSD
> 0:0:3:0    /dev/sdf   3.6T 046  sas    0x500a0751e6bd9681 2312E6BD968 CT4000MX500SSD
> 0:0:4:0    /dev/sdg   3.6T 045  sas    0x500a0751e6b5d30a 2309E6B5D30 CT4000MX500SSD
> 0:0:5:0    /dev/sdh   3.6T 046  sas    0x500a0751e6bd967e 2312E6BD967 CT4000MX500SSD
> 0:0:6:0    /dev/sdi   3.6T 046  sas    0x500a0751e6bd97e4 2312E6BD97E CT4000MX500SSD
> 0:0:7:0    /dev/sdj   3.6T 046  sas    0x500a0751e6bd96a0 2312E6BD96A CT4000MX500SSD
> 
> So my question is why it only happens to CT4000MX500SSD drives and why it just started now and I don't have in other servers? 
> Maybe it is related to firmware version "M3CR046 vs M3CR045" 
> I check the crucial website and actually "M3CR046" is not exist: https://www.crucial.com/support/ssd-support/mx500-support
> In this forum people recommend upgrading "M3CR046" https://forums.unraid.net/topic/134954-warning-crucial-mx500-ssds-world-of-pain-stay-away-from-these/
> But actually in my ud cluster all the drives are "M3CR045" and have lower latency. I'm really confused.
> 
> 
> Instead of writing udev rules for only CT4000MX500SSD is there any recommended udev rule for ceph and all type of sata drives? 
> 
> 
> 
> Anthony D'Atri <aad@xxxxxxxxxxxxxx <mailto:aad@xxxxxxxxxxxxxx>>, 22 Mar 2024 Cum, 17:00 tarihinde şunu yazdı:
>> <apple-touch-icon@xxxxx>
>> How to stop sys from changing USB SSD provisioning_mode from unmap to full in Ubuntu 22.04?
>> askubuntu.com
>>  <https://askubuntu.com/questions/1454997/how-to-stop-sys-from-changing-usb-ssd-provisioning-mode-from-unmap-to-full-in-ub>How to stop sys from changing USB SSD provisioning_mode from unmap to full in Ubuntu 22.04? <https://askubuntu.com/questions/1454997/how-to-stop-sys-from-changing-usb-ssd-provisioning-mode-from-unmap-to-full-in-ub>
>> askubuntu.com <https://askubuntu.com/questions/1454997/how-to-stop-sys-from-changing-usb-ssd-provisioning-mode-from-unmap-to-full-in-ub>?
>> 
>> 
>>> On Mar 22, 2024, at 09:36, Özkan Göksu <ozkangksu@xxxxxxxxx <mailto:ozkangksu@xxxxxxxxx>> wrote:
>>> 
>>> Hello!
>>> 
>>> After upgrading "5.15.0-84-generic" to "5.15.0-100-generic" (Ubuntu 22.04.2
>>> LTS) , commit latency started acting weird with "CT4000MX500SSD" drives.
>>> 
>>> osd  commit_latency(ms)  apply_latency(ms)
>>> 36                 867                867
>>> 37                3045               3045
>>> 38                  15                 15
>>> 39                  18                 18
>>> 42                1409               1409
>>> 43                1224               1224
>>> 
>>> I downgraded the kernel but the result did not change.
>>> I have a similar build and it didn't get upgraded and it is just fine.
>>> While I was digging I realised a difference.
>>> 
>>> This is high latency cluster and as you can see the "DISC-GRAN=0B",
>>> "DISC-MAX=0B"
>>> root@sd-01:~# lsblk -D
>>> NAME                                           DISC-ALN DISC-GRAN DISC-MAX
>>> DISC-ZERO
>>> sdc                                                   0        0B       0B
>>>        0
>>> ├─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--201d5050--db0c--41b4--85c4--6416ee989d6c
>>> │                                                     0        0B       0B
>>>        0
>>> └─ceph--76b7d255--2a01--4bd4--8d3e--880190181183-osd--block--5a376133--47de--4e29--9b75--2314665c2862
>>> 
>>> root@sd-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
>>> /sys/devices/pci0000:80/0000:80:03.0/0000:81:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/provisioning_mode:full
>>> 
>>> ------------------------------------------------------------------------------------------
>>> 
>>> This is low latency cluster and as you can see the "DISC-GRAN=4K",
>>> "DISC-MAX=2G"
>>> root@ud-01:~# lsblk -D
>>> NAME                                                              DISC-ALN
>>> DISC-GRAN DISC-MAX DISC-ZERO
>>> sdc                                                                      0
>>>       4K       2G         0
>>> ├─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--ec86a029--23f7--4328--9600--a24a290e3003
>>> │                                                                        0
>>>       4K       2G         0
>>> └─ceph--7496095f--18c7--41fd--90f2--d9b3e382bc8e-osd--block--5b69b748--d899--4f55--afc3--2ea3c8a05ca1
>>> 
>>> root@ud-01:~# find /sys/ -name provisioning_mode -exec grep -H . {} + | sort
>>> /sys/devices/pci0000:00/0000:00:11.4/ata3/host2/target2:0:0/2:0:0:0/scsi_disk/2:0:0:0/provisioning_mode:writesame_16
>>> 
>>> I think the problem is related to provisioning_mode but I really did not
>>> understand the reason.
>>> I boot with a live iso and still the drive was "provisioning_mode:full" so
>>> it means this is not related to my OS at all.
>>> 
>>> With the upgrade something changed and I think during boot sequence
>>> negotiation between LSI controller, drives and kernel started to assign
>>> "provisioning_mode:full" but I'm not sure.
>>> 
>>> What should I do ?
>>> 
>>> Best regards.
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx