Re: How to identify the index pool real usage?

"David C." <david.casier@xxxxxxxx> · Fri, 1 Dec 2023 16:39:36 +0100

Hi,

It looks like a trim/discard problem.

I would try my luck by activating the discard on a disk, to validate.

I have no feedback on the reliability of the bdev_*_discard parameters.
Maybe dig a little deeper into the subject or if anyone has any feedback...

________________________________________________________

Cordialement,

*David CASIER*

________________________________________________________

Le ven. 1 déc. 2023 à 16:15, Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
a écrit :

> Hi,
>
> Today we had a big issue with slow ops on the nvme drives which holding
> the index pool.
>
> Why the nvme shows full if on ceph is barely utilized? Which one I should
> belive?
>
> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
> drive has 4x osds on it):
>
> ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META
>   AVAIL    %USE   VAR   PGS  STATUS
> 195   nvme  0.43660   1.00000  447 GiB   47 GiB  161 MiB   46 GiB   656
> MiB  400 GiB  10.47  0.21   64      up
> 252   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   45 GiB   845
> MiB  401 GiB  10.35  0.21   64      up
> 253   nvme  0.43660   1.00000  447 GiB   46 GiB  229 MiB   45 GiB   662
> MiB  401 GiB  10.26  0.21   66      up
> 254   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   44 GiB   1.3
> GiB  401 GiB  10.26  0.21   65      up
> 255   nvme  0.43660   1.00000  447 GiB   47 GiB  161 MiB   46 GiB   1.2
> GiB  400 GiB  10.58  0.21   64      up
> 288   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   44 GiB   1.2
> GiB  401 GiB  10.25  0.21   64      up
> 289   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   45 GiB   641
> MiB  401 GiB  10.33  0.21   64      up
> 290   nvme  0.43660   1.00000  447 GiB   45 GiB  229 MiB   44 GiB   668
> MiB  402 GiB  10.14  0.21   65      up
>
> However in nvme list it says full:
> Node             SN                   Model
>     Namespace Usage                      Format           FW Rev
> ---------------- --------------------
> ---------------------------------------- ---------
> -------------------------- ---------------- --------
> /dev/nvme0n1     90D0A00XTXTR         KCD6XLUL1T92
>      1           1.92  TB /   1.92  TB    512   B +  0 B   GPK6
> /dev/nvme1n1     60P0A003TXTR         KCD6XLUL1T92
>      1           1.92  TB /   1.92  TB    512   B +  0 B   GPK6
>
> With some other node the test was like:
>
>   *   if none of the disk full, no slow ops.
>   *   If 1x disk full and the other not, has slow ops but not too much
>   *   if none of the disk full, no slow ops.
>
> The full disks are very highly utilized during recovery and they are
> holding back the operations from the other nvmes.
>
> What's the reason that even if the pgs are the same in the cluster +/-1
> regarding space they are not equally utilized.
>
> Thank you
>
>
>
> ________________________________
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx