Re: How to identify the index pool real usage?

"Szabo, Istvan (Agoda)" <Istvan.Szabo@xxxxxxxxx> · Mon, 4 Dec 2023 09:14:50 +0000

These values shouldn't be true to be able to do triming?

"bdev_async_discard": "false",
"bdev_enable_discard": "false",

Istvan Szabo
Staff Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo@xxxxxxxxx<mailto:istvan.szabo@xxxxxxxxx>
---------------------------------------------------

________________________________
From: David C. <david.casier@xxxxxxxx>
Sent: Monday, December 4, 2023 3:44 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
Cc: Anthony D'Atri <anthony.datri@xxxxxxxxx>; Ceph Users <ceph-users@xxxxxxx>
Subject: Re:  How to identify the index pool real usage?

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________
Hi,

A flash system needs free space to work efficiently.

Hence my hypothesis that fully allocated disks need to be notified of free blocks (trim)
________________________________________________________

Cordialement,

David CASIER
________________________________________________________

Le lun. 4 déc. 2023 à 06:01, Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>> a écrit :
With the nodes that has some free space on that namespace, we don't have issue, only with this which is weird.
________________________________
From: Anthony D'Atri <anthony.datri@xxxxxxxxx<mailto:anthony.datri@xxxxxxxxx>>
Sent: Friday, December 1, 2023 10:53 PM
To: David C. <david.casier@xxxxxxxx<mailto:david.casier@xxxxxxxx>>
Cc: Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx<mailto:Istvan.Szabo@xxxxxxxxx>>; Ceph Users <ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>>
Subject: Re:  How to identify the index pool real usage?

Email received from the internet. If in doubt, don't click any link nor open any attachment !
________________________________

>>
>> Today we had a big issue with slow ops on the nvme drives which holding
>> the index pool.
>>
>> Why the nvme shows full if on ceph is barely utilized? Which one I should
>> belive?
>>
>> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
>> drive has 4x osds on it):

Why split each device into 4 very small OSDs?  You're losing a lot of capacity to overhead.

>>
>> ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META  AVAIL    %USE   VAR   PGS  STATUS
>> 195   nvme  0.43660   1.00000  447 GiB   47 GiB  161 MiB   46 GiB   656 MiB  400 GiB  10.47  0.21   64      up
>> 252   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   45 GiB   845 MiB  401 GiB  10.35  0.21   64      up
>> 253   nvme  0.43660   1.00000  447 GiB   46 GiB  229 MiB   45 GiB   662 MiB  401 GiB  10.26  0.21   66      up
>> 254   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   44 GiB   1.3 GiB  401 GiB  10.26  0.21   65      up
>> 255   nvme  0.43660   1.00000  447 GiB   47 GiB  161 MiB   46 GiB   1.2 GiB  400 GiB  10.58  0.21   64      up
>> 288   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   44 GiB   1.2 GiB  401 GiB  10.25  0.21   64      up
>> 289   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   45 GiB   641 MiB  401 GiB  10.33  0.21   64      up
>> 290   nvme  0.43660   1.00000  447 GiB   45 GiB  229 MiB   44 GiB   668 MiB  402 GiB  10.14  0.21   65      up
>>
>> However in nvme list it says full:
>> Node             SN                   Model    Namespace Usage                      Format           FW Rev
>> ---------------- -------------------- --------------------------------------- ---------
>> -------------------------- ---------------- --------

>> /dev/nvme0n1     90D0A00XTXTR         KCD6XLUL1T92     1           1.92  TB /   1.92  TB    512   B +  0 B   GPK6
>> /dev/nvme1n1     60P0A003TXTR         KCD6XLUL1T92     1           1.92  TB /   1.92  TB    512   B +  0 B   GPK6

That command isn't telling you what you think it is.  It has no awareness of actual data, it's looking at NVMe namespaces.

>>
>> With some other node the test was like:
>>
>>  *   if none of the disk full, no slow ops.
>>  *   If 1x disk full and the other not, has slow ops but not too much
>>  *   if none of the disk full, no slow ops.
>>
>> The full disks are very highly utilized during recovery and they are
>> holding back the operations from the other nvmes.
>>
>> What's the reason that even if the pgs are the same in the cluster +/-1
>> regarding space they are not equally utilized.
>>
>> Thank you
>>
>>
>>
>> ________________________________
>> This message is confidential and is for the sole use of the intended
>> recipient(s). It may also be privileged or otherwise protected by copyright
>> or other legal rules. If you have received it by mistake please let us know
>> by reply email and delete it from your system. It is prohibited to copy
>> this message or disclose its content to anyone. Any confidentiality or
>> privilege is not waived or lost by any mistaken delivery or unauthorized
>> disclosure of the message. All messages sent to and from Agoda may be
>> monitored to ensure compliance with company policies, to protect the
>> company's interests and to remove potential malware. Electronic messages
>> may be intercepted, amended, lost or deleted, or contain viruses.
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx