Describe your hardware, please, and are you talking an orderly "shutdown -r" reboot, or a kernel / system crash or power loss? Often corruptions like this are a result of: * Using non-enterprise SSDs that lack power loss protection * Buggy / defective RAID HBAs * Enabling volatile write cache on drives > On Aug 5, 2024, at 4:54 AM, Reza Bakhshayeshi <reza.b2008@xxxxxxxxx> wrote: > > Hello, > > Whenever a node reboots in the cluster I get some corrupted OSDs, is there > any config I should set to prevent this from happening that I am not aware > of? > > Here is the error log: > > # kubectl logs rook-ceph-osd-1-5dcbd99cc7-2l5g2 -c expand-bluefs > > ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x135) [0x7f969977ce15] > 2: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f969977cfdb] > 3: (BlueStore::expand_devices(std::ostream&)+0x5ff) [0x55ce89d1f3ff] > 4: main() > 5: __libc_start_main() > 6: _start() > > 0> 2024-07-31T08:39:19.840+0000 7f969b1c0980 -1 *** Caught signal > (Aborted) ** > in thread 7f969b1c0980 thread_name:ceph-bluestore- > > ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef > (stable) > 1: /lib64/libpthread.so.0(+0x12d20) [0x7f969843fd20] > 2: gsignal() > 3: abort() > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x18f) [0x7f969977ce6f] > 5: /usr/lib64/ceph/libceph-common.so.2(+0x2a9fdb) [0x7f969977cfdb] > 6: (BlueStore::expand_devices(std::ostream&)+0x5ff) [0x55ce89d1f3ff] > 7: main() > 8: __libc_start_main() > 9: _start() > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx