Found doc related to troubleshooting OSD: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/troubleshooting_guide/troubleshooting-ceph-osds On Thu, Mar 24, 2022 at 12:43 AM Neeraj Pratap Singh <neesingh@xxxxxxxxxx> wrote: > Hi, > Ceph prevents clients from performing I/O operations on full OSD nodes to > avoid losing data. It returns the HEALTH_ERR full osds message when the > cluster reaches the capacity set by the mon_osd_full_ratio parameter. By > default, this parameter is set to 0.95 which means 95% of the cluster > capacity. > If % RAW USED is above 70-75%, there are two options: > 1. Delete unnecessary data, but this is short time solution. > 2. OR,Scale the cluster by adding a new OSD node. > > On Wed, Mar 23, 2022 at 11:44 PM Rodrigo Werle <rodrigo.werle@xxxxxxxxx> > wrote: > > > Thanks Eugen! > > Actually, as it is a nvme disk and I thought that the weight could be > > greater. I tried changing it to 0.9 but it is still saying that osd is > > full. > > This osd got full yesterday. I wiped it out and re added. After some > time, > > the "osd full" message happened again. It is like the ceph thinks the osd > > is still full, as it was before... > > > > > > Em qua., 23 de mar. de 2022 às 14:38, Eugen Block <eblock@xxxxxx> > > escreveu: > > > > > Without having an answer to the question why the OSD is full I'm > > > wondering why the OSD has a crush weight of 1.29999 while its size is > > > only 1 TB. Was that changed on purpose? I'm not sure if that would > > > explain the OSD full message, though. > > > > > > > > > Zitat von Rodrigo Werle <rodrigo.werle@xxxxxxxxx>: > > > > > > > Hi everyone! > > > > I'm trying to understand why Ceph changed the state of one osd as > full > > if > > > > it is 73% used and full_ratio is 0.97. > > > > > > > > Follow some information: > > > > > > > > # ceph health detail > > > > (...) > > > > [ERR] OSD_FULL: 1 full osd(s) > > > > osd.11 is full > > > > (...) > > > > > > > > # ceph osd metadata > > > > (...) > > > > > > > >> { > > > >> "id": 11, > > > >> "arch": "x86_64", > > > >> "back_iface": "", > > > >> "bluefs": "1", > > > >> "bluefs_dedicated_db": "0", > > > >> "bluefs_dedicated_wal": "0", > > > >> "bluefs_single_shared_device": "1", > > > >> "bluestore_bdev_access_mode": "blk", > > > >> "bluestore_bdev_block_size": "4096", > > > >> "bluestore_bdev_dev_node": "/dev/dm-41", > > > >> "bluestore_bdev_devices": "nvme0n1", > > > >> "bluestore_bdev_driver": "KernelDevice", > > > >> "bluestore_bdev_partition_path": "/dev/dm-41", > > > >> "bluestore_bdev_rotational": "0", > > > >> "bluestore_bdev_size": "1000200994816", > > > >> "bluestore_bdev_support_discard": "1", > > > >> "bluestore_bdev_type": "ssd", > > > >> "ceph_release": "octopus", > > > >> "ceph_version": "ceph version 15.2.16 > > > >> (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable)", > > > >> "ceph_version_short": "15.2.16", > > > >> "cpu": "Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz", > > > >> "default_device_class": "ssd", > > > >> "device_ids": > > > >> "nvme0n1=Samsung_SSD_970_EVO_Plus_1TB_S59ANJ0N308050F", > > > >> "device_paths": > > > >> "nvme0n1=/dev/disk/by-path/pci-0000:01:00.0-nvme-1", > > > >> "devices": "nvme0n1", > > > >> "distro": "ubuntu", > > > >> "distro_description": "Ubuntu 18.04.6 LTS", > > > >> "distro_version": "18.04", > > > >> "front_iface": "", > > > >> "hostname": "pf-us1-dfs6", > > > >> "journal_rotational": "0", > > > >> "kernel_description": "#182-Ubuntu SMP Fri Mar 18 15:53:46 > UTC > > > >> 2022", > > > >> "kernel_version": "4.15.0-173-generic", > > > >> "mem_swap_kb": "117440508", > > > >> "mem_total_kb": "181456152", > > > >> "network_numa_unknown_ifaces": "back_iface,front_iface", > > > >> "objectstore_numa_node": "0", > > > >> "objectstore_numa_nodes": "0", > > > >> "os": "Linux", > > > >> "osd_data": "/var/lib/ceph/osd/ceph-11", > > > >> "osd_objectstore": "bluestore", > > > >> "osdspec_affinity": "", > > > >> "rotational": "0" > > > >> } (...) > > > > > > > > > > > > # cat /var/lib/ceph/osd/ceph-11/type > > > > > > > >> bluestore > > > > > > > > > > > > # ls -l /var/lib/ceph/osd/ceph-11 | grep block > > > > > > > >> lrwxrwxrwx 1 ceph ceph 50 Mar 22 21:44 block -> > > > >> /dev/mapper/YpB2cx-HlyU-VPqT-Abaz-Dutx-iMrz-Tty6o1 > > > > > > > > > > > > # lsblk > > > > > > > >> nvme0n1 259:0 > > 0 > > > >> 931.5G 0 disk > > > >> > > > > > > └─ceph--7ef8f83a--d055--4a59--8d6b--c564544c5a55-osd--block--c7c03dfd--a1b1--4182--9e61--bf264f293f2b > > > >> 253:0 > 0 > > > >> 931.5G 0 lvm > > > >> └─YpB2cx-HlyU-VPqT-Abaz-Dutx-iMrz-Tty6o1 253:41 > > 0 > > > >> 931.5G 0 crypt > > > > > > > > > > > > # ceph osd dump | grep full_ratio > > > > > > > >> full_ratio 0.97 > > > >> backfillfull_ratio 0.95 > > > >> nearfull_ratio 0.9 > > > > > > > > > > > > # ceph osd df | head > > > >> > > > >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP > > > >> META AVAIL %USE VAR PGS STATUS > > > >> 11 nvme 1.29999 1.00000 932 GiB 694 GiB 319 GiB 301 > > > >> GiB 74 GiB 238 GiB 74.49 1.09 96 up > > > > > > > > > > > > Any ideas? > > > > Thanks! > > > > -- > > > > > > > > Rodrigo Werle > > > > _______________________________________________ > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > -- > > > > Rodrigo Magalhães Werle > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > -- > *Thanks and Regards!!* > *Neeraj Pratap Singh* > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx