Re: allocate_bluefs_freespace failed to allocate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>From my understanding you do not have a separate DB/WAL device per OSD. Since RocksDB uses bluefs for OMAP storage, we can check the usage and free size for bluefs on problematic osd's.ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-OSD_ID --command bluefs-bdev-sizesProbably it can shed some light as to why the allocator did not work and you had to compact.Надіслано з пристрою Galaxy
-------- Оригінальне повідомлення --------Від: mhnx <morphinwithyou@xxxxxxxxx> Дата: 09.11.21  03:05  (GMT+02:00) Кому: prosergey07 <prosergey07@xxxxxxxxx> Копія: Ceph Users <ceph-users@xxxxxxx> Тема: Re:  allocate_bluefs_freespace failed to allocate I was trying to keep things clear and I was aware of the login issue. Sorry. You're right. OSD's are not full. Need balance but I can't activate the balancer because of the issue.ceph osd df tree | grep 'CLASS\|ssd'                                                                                                                          ID  CLASS WEIGHT     REWEIGHT SIZE    RAW USE DATA    OMAP    META    AVAIL   %USE  VAR  PGS STATUS TYPE NAME
 19   ssd    0.87320  1.00000 894 GiB 401 GiB 155 GiB 238 GiB 8.6 GiB 493 GiB 44.88 0.83 102     up         osd.19
208   ssd    0.87329  1.00000 894 GiB 229 GiB 112 GiB 116 GiB 1.5 GiB 665 GiB 25.64 0.48  95     up         osd.208
209   ssd    0.87329  1.00000 894 GiB 228 GiB 110 GiB 115 GiB 3.3 GiB 666 GiB 25.54 0.48  65     up         osd.209
199   ssd    0.87320  1.00000 894 GiB 348 GiB 155 GiB 191 GiB 1.3 GiB 546 GiB 38.93 0.72 103     up         osd.199
202   ssd    0.87329  1.00000 894 GiB 340 GiB 116 GiB 223 GiB 1.7 GiB 554 GiB 38.04 0.71  97     up         osd.202
218   ssd    0.87329  1.00000 894 GiB 214 GiB  95 GiB 118 GiB 839 MiB 680 GiB 23.92 0.44  37     up         osd.218
 39   ssd    0.87320  1.00000 894 GiB 381 GiB 114 GiB 261 GiB 6.4 GiB 514 GiB 42.57 0.79  91     up         osd.39
207   ssd    0.87329  1.00000 894 GiB 277 GiB 115 GiB 155 GiB 6.2 GiB 618 GiB 30.94 0.58  81     up         osd.207
210   ssd    0.87329  1.00000 894 GiB 346 GiB 138 GiB 207 GiB 1.6 GiB 548 GiB 38.73 0.72  99     up         osd.210
 59   ssd    0.87320  1.00000 894 GiB 423 GiB 166 GiB 254 GiB 2.9 GiB 471 GiB 47.29 0.88  97     up         osd.59
203   ssd    0.87329  1.00000 894 GiB 363 GiB 127 GiB 229 GiB 7.7 GiB 531 GiB 40.63 0.76 104     up         osd.203
211   ssd    0.87329  1.00000 894 GiB 257 GiB  76 GiB 179 GiB 1.9 GiB 638 GiB 28.70 0.53  81     up         osd.211
 79   ssd    0.87320  1.00000 894 GiB 459 GiB 144 GiB 313 GiB 2.0 GiB 435 GiB 51.32 0.95 102     up         osd.79
206   ssd    0.87329  1.00000 894 GiB 339 GiB 140 GiB 197 GiB 2.0 GiB 556 GiB 37.88 0.70  94     up         osd.206
212   ssd    0.87329  1.00000 894 GiB 301 GiB 107 GiB 192 GiB 1.5 GiB 593 GiB 33.68 0.63  80     up         osd.212
 99   ssd    0.87320  1.00000 894 GiB 282 GiB  96 GiB 180 GiB 6.2 GiB 612 GiB 31.59 0.59  85     up         osd.99
205   ssd    0.87329  1.00000 894 GiB 309 GiB 115 GiB 186 GiB 7.5 GiB 585 GiB 34.56 0.64  95     up         osd.205
213   ssd    0.87329  1.00000 894 GiB 335 GiB 119 GiB 213 GiB 2.5 GiB 559 GiB 37.44 0.70  95     up         osd.213
114   ssd    0.87329  1.00000 894 GiB 374 GiB 163 GiB 207 GiB 3.9 GiB 520 GiB 41.84 0.78  99     up         osd.114
200   ssd    0.87329  1.00000 894 GiB 271 GiB 104 GiB 163 GiB 3.0 GiB 624 GiB 30.26 0.56  90     up         osd.200
214   ssd    0.87329  1.00000 894 GiB 336 GiB 135 GiB 199 GiB 2.7 GiB 558 GiB 37.59 0.70 100     up         osd.214
139   ssd    0.87320  1.00000 894 GiB 320 GiB 128 GiB 189 GiB 3.6 GiB 574 GiB 35.82 0.67  96     up         osd.139
204   ssd    0.87329  1.00000 894 GiB 362 GiB 153 GiB 206 GiB 3.1 GiB 532 GiB 40.47 0.75 104     up         osd.204
215   ssd    0.87329  1.00000 894 GiB 236 GiB  99 GiB 133 GiB 3.4 GiB 659 GiB 26.35 0.49  81     up         osd.215
119   ssd    0.87329  1.00000 894 GiB 242 GiB 139 GiB 101 GiB 2.1 GiB 652 GiB 27.09 0.50  99     up         osd.119
159   ssd    0.87329  1.00000 894 GiB 253 GiB 127 GiB 123 GiB 2.7 GiB 642 GiB 28.25 0.53  93     up         osd.159
216   ssd    0.87329  1.00000 894 GiB 378 GiB 137 GiB 239 GiB 1.8 GiB 517 GiB 42.22 0.79 101     up         osd.216
179   ssd    0.87329  1.00000 894 GiB 473 GiB 112 GiB 348 GiB  12 GiB 421 GiB 52.91 0.98 104     up         osd.179
201   ssd    0.87329  1.00000 894 GiB 348 GiB 137 GiB 203 GiB 8.5 GiB 546 GiB 38.92 0.72 103     up         osd.201
217   ssd    0.87329  1.00000 894 GiB 301 GiB 105 GiB 194 GiB 2.5 GiB 593 GiB 33.64 0.63  89     up         osd.217prosergey07 <prosergey07@xxxxxxxxx>, 9 Kas 2021 Sal, 03:02 tarihinde şunu yazdı:Are those problematic OSDs getting almost full ? I do not have Ubuntu account to check their pastebin.Надіслано з пристрою Galaxy-------- Оригінальне повідомлення --------Від: mhnx <morphinwithyou@xxxxxxxxx> Дата: 08.11.21  15:31  (GMT+02:00) Кому: Ceph Users <ceph-users@xxxxxxx> Тема:  allocate_bluefs_freespace failed to allocate Hello.I'm using Nautilus 14.2.16I have 30 SSD in my cluster and I use them as Bluestore OSD for RGW index.Almost every week I'm losing (down) an OSD and when I check osd log I see:    -6> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocatefailed to allocate 0xf4f04 on bdev 1, free 0xb0000; fallback to bdev2*    -5> 2021-11-06 19:01:10.854 7fa799989c40  1 *bluefs _allocateunable to allocate 0xf4f04 on bdev 2, free 0xffffffffffffffff;fallback to slow device expander*    -4> 2021-11-06 19:01:10.854 7fa799989c40 -1bluestore(/var/lib/ceph/osd/ceph-218) *allocate_bluefs_freespacefailed to allocate on* 0x80000000 min_size 0x100000 > allocated total0x0 bluefs_shared_alloc_size 0x10000 allocated 0x0 available 0xa497aab000    -3> 2021-11-06 19:01:10.854 7fa799989c40 -1 *bluefs _allocatefailed to expand slow device to fit +0xf4f04*Full log: https://paste.ubuntu.com/p/MpJfVjMh7V/plain/And OSD does not start without offline compaction.Offline compaction log: https://paste.ubuntu.com/p/vFZcYnxQWh/plain/After the Offline compaction I tried to start OSD with bitmap allocator butit is not getting up because of " FAILED ceph_assert(available >=allocated)"Log: https://paste.ubuntu.com/p/2Bbx983494/plain/Then I start the OSD with hybrid allocator and let it recover.When the recover is done I stop the OSD and start with the bitmapallocator.This time it came up but I've got "80 slow ops, oldest one blocked for 116sec, osd.218 ha
s slow ops" and I increased "osd_recovery_sleep 10" to givea breath to cluster and cluster marked the osd as down (it was stillworking) after a while the osd marked up and cluster became normal. Butwhile recovering, other osd's started to give slow ops and I've playedaround with "osd_recovery_sleep 0.1 <---> 10" to keep the cluster stabletill recovery finishes.Ceph osd df tree before: https://paste.ubuntu.com/p/4K7JXcZ8FJ/plain/Ceph osd df tree after osd.218 = bitmap:https://paste.ubuntu.com/p/5SKbhrbgVM/plain/If I want to change all other osd's allocator to bitmap, I need to repeatthe process 29 time and it will take too much time.I don't want to heal OSDs with the offline compaction anymore so I will dothat if that's the solution but I want to be sure before doing a lot ofwork and maybe with the issue I can provide helpful logs and informationfor developers.Have a nice day.Thanks._______________________________________________ceph-users mailing list -- ceph-users@xxxxxxxxx unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux