Re: How to identify the index pool real usage?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

A flash system needs free space to work efficiently.

Hence my hypothesis that fully allocated disks need to be notified of free
blocks (trim)
________________________________________________________

Cordialement,

*David CASIER*
________________________________________________________




Le lun. 4 déc. 2023 à 06:01, Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>
a écrit :

> With the nodes that has some free space on that namespace, we don't have
> issue, only with this which is weird.
> ------------------------------
> *From:* Anthony D'Atri <anthony.datri@xxxxxxxxx>
> *Sent:* Friday, December 1, 2023 10:53 PM
> *To:* David C. <david.casier@xxxxxxxx>
> *Cc:* Szabo, Istvan (Agoda) <Istvan.Szabo@xxxxxxxxx>; Ceph Users <
> ceph-users@xxxxxxx>
> *Subject:* Re:  How to identify the index pool real usage?
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> ________________________________
>
> >>
> >> Today we had a big issue with slow ops on the nvme drives which holding
> >> the index pool.
> >>
> >> Why the nvme shows full if on ceph is barely utilized? Which one I
> should
> >> belive?
> >>
> >> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme
> >> drive has 4x osds on it):
>
> Why split each device into 4 very small OSDs?  You're losing a lot of
> capacity to overhead.
>
> >>
> >> ID   CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP
> META  AVAIL    %USE   VAR   PGS  STATUS
> >> 195   nvme  0.43660   1.00000  447 GiB   47 GiB  161 MiB   46 GiB   656
> MiB  400 GiB  10.47  0.21   64      up
> >> 252   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   45 GiB   845
> MiB  401 GiB  10.35  0.21   64      up
> >> 253   nvme  0.43660   1.00000  447 GiB   46 GiB  229 MiB   45 GiB   662
> MiB  401 GiB  10.26  0.21   66      up
> >> 254   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   44 GiB   1.3
> GiB  401 GiB  10.26  0.21   65      up
> >> 255   nvme  0.43660   1.00000  447 GiB   47 GiB  161 MiB   46 GiB   1.2
> GiB  400 GiB  10.58  0.21   64      up
> >> 288   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   44 GiB   1.2
> GiB  401 GiB  10.25  0.21   64      up
> >> 289   nvme  0.43660   1.00000  447 GiB   46 GiB  161 MiB   45 GiB   641
> MiB  401 GiB  10.33  0.21   64      up
> >> 290   nvme  0.43660   1.00000  447 GiB   45 GiB  229 MiB   44 GiB   668
> MiB  402 GiB  10.14  0.21   65      up
> >>
> >> However in nvme list it says full:
> >> Node             SN                   Model    Namespace
> Usage                      Format           FW Rev
> >> ---------------- --------------------
> --------------------------------------- ---------
> >> -------------------------- ---------------- --------
>
> >> /dev/nvme0n1     90D0A00XTXTR         KCD6XLUL1T92     1
> 1.92  TB /   1.92  TB    512   B +  0 B   GPK6
> >> /dev/nvme1n1     60P0A003TXTR         KCD6XLUL1T92     1
> 1.92  TB /   1.92  TB    512   B +  0 B   GPK6
>
> That command isn't telling you what you think it is.  It has no awareness
> of actual data, it's looking at NVMe namespaces.
>
> >>
> >> With some other node the test was like:
> >>
> >>  *   if none of the disk full, no slow ops.
> >>  *   If 1x disk full and the other not, has slow ops but not too much
> >>  *   if none of the disk full, no slow ops.
> >>
> >> The full disks are very highly utilized during recovery and they are
> >> holding back the operations from the other nvmes.
> >>
> >> What's the reason that even if the pgs are the same in the cluster +/-1
> >> regarding space they are not equally utilized.
> >>
> >> Thank you
> >>
> >>
> >>
> >> ________________________________
> >> This message is confidential and is for the sole use of the intended
> >> recipient(s). It may also be privileged or otherwise protected by
> copyright
> >> or other legal rules. If you have received it by mistake please let us
> know
> >> by reply email and delete it from your system. It is prohibited to copy
> >> this message or disclose its content to anyone. Any confidentiality or
> >> privilege is not waived or lost by any mistaken delivery or unauthorized
> >> disclosure of the message. All messages sent to and from Agoda may be
> >> monitored to ensure compliance with company policies, to protect the
> >> company's interests and to remove potential malware. Electronic messages
> >> may be intercepted, amended, lost or deleted, or contain viruses.
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> ------------------------------
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux