Re: Help - Multiple OSD's Down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First of all, do not rush into bad decisions.
Production is down and you wanna make it online but you should fix the
problem and be sure first. If a second crash occurs in a healing state
you will lose metadata.
You don't need to debug first!

You didn't mention your cluster status and we don't know what you have.
We need some information;
1- ceph -s
2- ceph health detail
3- ceph df
4- tail /var/log/ceph/ceph-osd{crashed osd number}.log -n 1000



Lee <lquince@xxxxxxxxx>, 5 Oca 2022 Çar, 23:14 tarihinde şunu yazdı:
>
> Looking for some help as this is production effecting..
>
> We run a 3 Node cluster with a mix of 5xSSD,15xSATA and 5xSAS in each node.
> Running 15.2.15. All using DB/WAL on NVME SSD except the SSD's
>
> Earlier today I increased the PG num from 32 to 128 on one of our pools,
> due to the status complaining. pretty normally really. 2-3 mins in I
> watched in horror as SSD based OSD's crashed on all 3 nodes, refusing to
> restart.
>
> I've set debug_bluefs and bluestore to 20 it will get so far and then the
> daemon fails.
>
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.335+0000 7f2794383700 20
> bluestore(/var/lib/ceph/osd/ceph-51) deferred_try_submit 0 osrs, 0 txcs
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.335+0000 7f2794383700  5
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.387+0000 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:23 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.467+0000 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:23.979+0000 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:24.167+0000 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:24.271+0000 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:24 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:24.327+0000 7f2794383700 20
> bluestore.MempoolThread(0x560433f0aa98) _resize_shards cache_size:
> 134217728 kv_alloc: 67108864 kv_used: 67075728 meta_alloc: 67108864
> meta_used: 75234 data_alloc: 67108864 data_used: 0
> 2022-01-05 19:39:32 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Main process exited, code=killed, status=9/KILL
> 2022-01-05 19:39:32 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Failed with result 'signal'.
> 2022-01-05 19:39:42 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Scheduled restart job, restart counter is at 1.
>
> I've run
> ceph-bluestore-tool bluefs-bdev-sizes --path /var/lib/ceph/osd/ceph-51
> inferring bluefs devices from bluestore path
> 1 : device size 0x3a38800000 : own 0x[1bf2200000~254300000] = 0x254300000 :
> using 0x3fd10000(1021 MiB) : bluestore has 0x1d83400000(118 GiB) available
>
> Also fsck and repair all seems to be ok.
>
> The normal log looks like
>
> 2022-01-05 19:39:42 bb-ceph-enc-rm63-osd03-31 init.scope Starting Ceph
> object storage daemon osd.51...
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.467+0000 7fca32943e00  0 set uid:gid to 64045:64045
> (ceph:ceph)
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.467+0000 7fca32943e00  0 ceph version 15.2.15
> (2dfb18841cfecc2f7eb7eb2afd65986ca4d95985) octopus (stable), process
> ceph-osd, pid 139577
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.467+0000 7fca32943e00  0 pidfile_write: ignore empty
> --pid-file
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev create path
> /var/lib/ceph/osd/ceph-51/block type kernel
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev(0x55b4b234e000
> /var/lib/ceph/osd/ceph-51/block) open path /var/lib/ceph/osd/ceph-51/block
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev(0x55b4b234e000
> /var/lib/ceph/osd/ceph-51/block) open size 250056015872 (0x3a38800000, 233
> GiB) block_size 4096 (4 KiB) non-rotational discard not supported
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1
> bluestore(/var/lib/ceph/osd/ceph-51) _set_cache_sizes cache_size 1073741824
> meta 0.4 kv 0.4 data 0.2
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev create path
> /var/lib/ceph/osd/ceph-51/block type kernel
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev(0x55b4b234e380
> /var/lib/ceph/osd/ceph-51/block) open path /var/lib/ceph/osd/ceph-51/block
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev(0x55b4b234e380
> /var/lib/ceph/osd/ceph-51/block) open size 250056015872 (0x3a38800000, 233
> GiB) block_size 4096 (4 KiB) non-rotational discard not supported
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bluefs add_block_device bdev 1
> path /var/lib/ceph/osd/ceph-51/block size 233 GiB
> 2022-01-05 19:39:46 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:46.491+0000 7fca32943e00  1 bdev(0x55b4b234e380
> /var/lib/ceph/osd/ceph-51/block) close
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.067+0000 7fca32943e00  0 starting osd.51 osd_data
> /var/lib/ceph/osd/ceph-51 /var/lib/ceph/osd/ceph-51/journal
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.159+0000 7fca32943e00  0 load: jerasure load: lrc load:
> isa
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.159+0000 7fca32943e00  1 bdev create path
> /var/lib/ceph/osd/ceph-51/block type kernel
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.159+0000 7fca32943e00  1 bdev(0x55b4b234e000
> /var/lib/ceph/osd/ceph-51/block) open path /var/lib/ceph/osd/ceph-51/block
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.163+0000 7fca32943e00  1 bdev(0x55b4b234e000
> /var/lib/ceph/osd/ceph-51/block) open size 250056015872 (0x3a38800000, 233
> GiB) block_size 4096 (4 KiB) non-rotational discard not supported
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.163+0000 7fca32943e00  1
> bluestore(/var/lib/ceph/osd/ceph-51) _set_cache_sizes cache_size 1073741824
> meta 0.4 kv 0.4 data 0.2
> 2022-01-05 19:39:47 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:47.163+0000 7fca32943e00  1 bdev(0x55b4b234e000
> /var/lib/ceph/osd/ceph-51/block) close
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.619+0000 7fca32943e00  1
> bluestore(/var/lib/ceph/osd/ceph-51) _open_alloc loaded 138 GiB in 276582
> extents available 129 GiB
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.619+0000 7fca32943e00  1 bluefs umount
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.619+0000 7fca32943e00  1 bdev(0x55b4b234e380
> /var/lib/ceph/osd/ceph-51/block) close
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.803+0000 7fca32943e00  1 bdev create path
> /var/lib/ceph/osd/ceph-51/block type kernel
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.803+0000 7fca32943e00  1 bdev(0x55b4b234e380
> /var/lib/ceph/osd/ceph-51/block) open path /var/lib/ceph/osd/ceph-51/block
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.803+0000 7fca32943e00  1 bdev(0x55b4b234e380
> /var/lib/ceph/osd/ceph-51/block) open size 250056015872 (0x3a38800000, 233
> GiB) block_size 4096 (4 KiB) non-rotational discard not supported
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.803+0000 7fca32943e00  1 bluefs add_block_device bdev 1
> path /var/lib/ceph/osd/ceph-51/block size 233 GiB
> 2022-01-05 19:39:48 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:48.803+0000 7fca32943e00  1 bluefs mount
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.087+0000 7fca32943e00  1
> bluestore(/var/lib/ceph/osd/ceph-51) _open_db opened rocksdb path db
> options
> compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.087+0000 7fca32943e00  1
> bluestore(/var/lib/ceph/osd/ceph-51) _upgrade_super from 4, latest 4
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.087+0000 7fca32943e00  1
> bluestore(/var/lib/ceph/osd/ceph-51) _upgrade_super done
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.131+0000 7fca32943e00  0
>  /build/ceph-15.2.15/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.131+0000 7fca32943e00  0
>  /build/ceph-15.2.15/src/cls/hello/cls_hello.cc:312: loading cls_hello
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.135+0000 7fca32943e00  0 _get_class not permitted to
> load kvs
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.171+0000 7fca32943e00  0 _get_class not permitted to
> load lua
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.207+0000 7fca32943e00  0 _get_class not permitted to
> load queue
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.319+0000 7fca32943e00  0 _get_class not permitted to
> load sdk
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.319+0000 7fca32943e00  0 osd.51 24448261 crush map has
> features 288514051259236352, adjusting msgr requires for clients
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.319+0000 7fca32943e00  0 osd.51 24448261 crush map has
> features 288514051259236352 was 8705, adjusting msgr requires for mons
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.319+0000 7fca32943e00  0 osd.51 24448261 crush map has
> features 3314933000852226048, adjusting msgr requires for osds
> 2022-01-05 19:39:49 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:39:49.319+0000 7fca32943e00  1 osd.51 24448261
> check_osdmap_features require_osd_release unknown -> octopus
> 2022-01-05 19:41:25 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:41:24.999+0000 7fca32943e00  0 osd.51 24448261 load_pgs
> opened 66 pgs
> 2022-01-05 19:41:25 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:41:25.071+0000 7fca32943e00 -1 osd.51 24448261
> log_to_monitors {default=true}
> 2022-01-05 19:41:25 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:41:25.071+0000 7fca32943e00 -1 osd.51 24448261
> log_to_monitors {default=true}
> 2022-01-05 19:42:16 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:42:16.631+0000 7fca32943e00  0 osd.51 24448261 done with
> init, starting boot process
> 2022-01-05 19:42:16 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:42:16.631+0000 7fca32943e00  1 osd.51 24448261 start_boot
> 2022-01-05 19:42:16 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:42:16.635+0000 7fca14615700  1 osd.51 pg_epoch: 24448130
> pg[44.17( v 24448128'27126321 (24447767'27121032,24448128'27126321]
> local-lis/les=24447864/24447865 n=2356 ec=4550661/4550661
> lis/c=24447864/24447864 les/c/f=24447865/24447865/22709931 sis=24448130)
> [51,48,15] r=0 lpr=24448130 pi=[24447864,24448130)/1 crt=24448128'27126321
> lcod 0'0 mlcod 0'0 unknown mbc={}] start_peering_interval up [51,48,15] ->
> [51,48,15], acting [51,48,15] -> [51,48,15], acting_primary 51 -> 51,
> up_primary 51 -> 51, role 0 -> 0, features acting 4540138292840890367
> upacting 4540138292840890367
> 2022-01-05 19:42:16 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:42:16.635+0000 7fca13613700  1 osd.51 pg_epoch: 24448130
> pg[44.1( v 24448129'31648690 (24447777'31643388,24448129'31648690]
> local-lis/les=24447865/24447866 n=2314 ec=4550661/4550661
> lis/c=24447865/24447865 les/c/f=24447866/24447866/22709931 sis=24448130)
> [51,15,5] r=0 lpr=24448130 pi=[24447865,24448130)/1 crt=24448129'31648690
> lcod 0'0 mlcod 0'0 unknown mbc={}] start_peering_interval up [51,15,5] ->
> [51,15,5], acting [51,15,5] -> [51,15,5], acting_primary 51 -> 51,
> up_primary 51 -> 51, role 0 -> 0, features acting 4540138292840890367
> upacting 4540138292840890367
> 2022-01-05 19:42:16 bb-ceph-enc-rm63-osd03-31 osd.51
> 2022-01-05T19:42:16.635+0000 7fca15617700  1 osd.51 pg_epoch: 24448130
> pg[44.15( v 24448129'37939392 (24447777'37936883,24448129'37939392]
> local-lis/les=24448118/24448119 n=2350 ec=4550661/4550661
> lis/c=24448118/24448118 les/c/f=24448119/24448119/22709931 sis=24448130)
> [5,14,51] r=2 lpr=24448130 pi=[24448118,24448130)/1 crt=24448129'37939392
> lcod 0'0 mlcod 0'0 unknown mbc={}] start_peering_interval up [5,14,51] ->
> [5,14,51], acting [5,14,51] -> [5,14,51], acting_primary 5 -> 5, up_primary
> 5 -> 5, role 2 -> 2, features acting 4540138292840890367 upacting
> 4540138292840890367
> 2022-01-05 19:42:51 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Main process exited, code=killed, status=9/KILL
> 2022-01-05 19:42:51 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Failed with result 'signal'.
> 2022-01-05 19:43:01 bb-ceph-enc-rm63-osd03-31 init.scope ceph-osd@51.service:
> Scheduled restart job, restart counter is at 2.
>
>
> The problem I have this has basically taken the production and metadata SSD
> pool's down fully and all 3 copies are offline. And I cannot find a way to
> find out what is causing these to crash.
>
> Kind Regards
>
> Lee
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux