Re: Ceph nvme timeout and then aborting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>From ceph document, i see using fast device as wal/db could improve the performance.
So we using one(2TB) or two(1TB) samsung  Nvme 970pro as wal/db here, and yes, we have two data pools,  ssd pool and hdd pool, also ssd pool using samsung 860Pro.
the Nvme970 as wal/db for both ssd pool and hdd pool.
I haven’t do a test, mean compare the performance  WITH Nvme as wal/db for ssd pool and WITHOUT a nvme as wal/db as ssd pool.(https://docs.ceph.com/en/latest/start/hardware-recommendations/)
 Just due to see the document, said using fast device and we know nvme is faster than normal ssd. 

Also i have another question here, in some document,  it said we only need using fast device as db, and no need create wal(means using nvme or ssd as db for hdd pool, no need create wal ), do you think so?  

we will scale out the cluster soon(for fix the two crash nodes), and haven’t made the decision about device, one choice may be below:
1 nvme(samsung 980 pro) create db(no wal) for the hdd pool.
no nvme for the ssd pool, and the ssd disk using Samsung 883 dct
 
Would  you are experts please help to shed some light here, Thanks a ton!  
  
Thanks,
zx

> 在 2021年2月23日,上午5:32,Mark Lehrer <lehrer@xxxxxxxxx> 写道:
> 
>> Yes, it a Nvme, and on node has two Nvmes as db/wal, one
>> for ssd(0-2) and another for hdd(3-6).  I have no spare to try.
>> ...
>> I/O 517 QID 7 timeout, aborting
>> Input/output error
> 
> If you are seeing errors like these, it is almost certainly a bad
> drive unless you are using fabric.
> 
> Why are you putting the wal on an SSD in the first place?  Are you
> sure it is even necessary, especially when one of your pools is
> already SSD?
> 
> Adding this complexity just means that there are more things to break
> when you least expect it. Putting the db/wal on a separate drive is
> usually premature optimization that is only useful for benchmarkers.
> My opinion of course.
> 
> Mark
> 
> 
> 
> 
> 
> 
> 
> 
> On Sun, Feb 21, 2021 at 7:16 PM zxcs <zhuxiongcs@xxxxxxx> wrote:
>> 
>> Thanks for you reply!
>> 
>> Yes, it a Nvme, and on node has two Nvmes as db/wal, one for ssd(0-2) and another for hdd(3-6).
>> I have no spare to try.
>> It’s  very strange, the load not very high at that time. and both ssd and nvme seems healthy.
>> 
>> If cannot fix it.  I am afraid I need to setup more nodes and set out remove these OSDs which using this Nvme?
>> 
>> Thanks,
>> zx
>> 
>> 
>>> 在 2021年2月22日,上午10:07,Mark Lehrer <lehrer@xxxxxxxxx> 写道:
>>> 
>>>> One nvme  sudden crash again. Could anyone please help shed some light here?
>>> 
>>> It looks like a flaky NVMe drive.  Do you have a spare to try?
>>> 
>>> 
>>> On Mon, Feb 22, 2021 at 1:56 AM zxcs <zhuxiongcs@xxxxxxx> wrote:
>>>> 
>>>> One nvme  sudden crash again. Could anyone please help shed some light here? Thank a ton!!!
>>>> Below are syslog and ceph log.
>>>> 
>>>> From  /var/log/syslog
>>>> Feb 21 19:38:33 ip kernel: [232562.847916] nvme 0000:03:00.0: I/O 943 QID 7 timeout, aborting
>>>> Feb 21 19:38:34 ip kernel: [232563.847946] nvme 0000:03:00.0: I/O 911 QID 18 timeout, aborting
>>>> Feb 21 19:38:34 ip kernel: [232563.847964] nvme 0000:03:00.0: I/O 776 QID 28 timeout, aborting
>>>> Feb 21 19:38:36 ip ceph-osd[3241]: 2021-02-21 19:38:36.218 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:36 ip kernel: [232565.851961] nvme 0000:03:00.0: I/O 442 QID 2 timeout, aborting
>>>> Feb 21 19:38:36 ip kernel: [232565.851982] nvme 0000:03:00.0: I/O 912 QID 18 timeout, aborting
>>>> Feb 21 19:38:37 ip ceph-osd[3241]: 2021-02-21 19:38:37.254 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:38 ip ceph-osd[3241]: 2021-02-21 19:38:38.286 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:39 ip ceph-osd[3241]: 2021-02-21 19:38:39.334 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:40 ip ceph-osd[3241]: 2021-02-21 19:38:40.322 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:41 ip ceph-osd[3241]: 2021-02-21 19:38:41.326 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:41 ip kernel: [232570.852035] nvme 0000:03:00.0: I/O 860 QID 9 timeout, aborting
>>>> Feb 21 19:38:42 ip ceph-osd[3241]: 2021-02-21 19:38:42.298 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:43 ip ceph-osd[3241]: 2021-02-21 19:38:43.258 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:44 ip ceph-osd[3241]: 2021-02-21 19:38:44.258 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:45 ip ntpd[3480]: Soliciting pool server 84.16.67.12
>>>> Feb 21 19:38:45 ip ceph-osd[3241]: 2021-02-21 19:38:45.286 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:46 ip ceph-osd[3241]: 2021-02-21 19:38:46.254 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:38:47 ip ceph-osd[3241]: 2021-02-21 19:38:47.226 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> Feb 21 19:39:04 ip kernel: [232593.860464] nvme 0000:03:00.0: I/O 943 QID 7 timeout, reset controller
>>>> Feb 21 19:39:33 ip kernel: [232622.868975] nvme 0000:03:00.0: I/O 0 QID 0 timeout, reset controller
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: 2021-02-21 19:40:35.211 7f0236585700 -1 bdev(0xc3f0a80 /var/lib/ceph/osd/ceph-16/block.wal) _aio_thread got r=-5 ((5) Input/output error)
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: 2021-02-21 19:40:35.211 7f021a54d700 -1 bdev(0xc3f0700 /var/lib/ceph/osd/ceph-16/block.db) direct_read_unaligned stalled read  0x611527d7~fe6 since 232570.311299s, timeout is 5.000000s
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: 2021-02-21 19:40:35.211 7f021a54d700 -1 bdev(0xc3f0700 /var/lib/ceph/osd/ceph-16/block.db) direct_read_unaligned 0x611527d7~fe6 error: (5) Input/output error
>>>> Feb 21 19:40:35 ip kernel: [232684.442054] nvme 0000:03:00.0: Device not ready; aborting reset
>>>> Feb 21 19:40:35 ip kernel: [232684.442694] nvme 0000:03:00.0: Cancelling I/O 776 QID 28
>>>> Feb 21 19:40:35 ip kernel: [232684.442700] nvme 0000:03:00.0: completing aborted command with status: 0007
>>>> Feb 21 19:40:35 ip kernel: [232684.442705] blk_update_request: I/O error, dev nvme0n1, sector 1344455888
>>>> Feb 21 19:40:35 ip kernel: [232684.442937] nvme 0000:03:00.0: Cancelling I/O 911 QID 18
>>>> Feb 21 19:40:35 ip kernel: [232684.442941] nvme 0000:03:00.0: completing aborted command with status: 0007
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7f021a54d700 time 2021-02-21 19:40:35.216141
>>>> Feb 21 19:40:35 ip kernel: [232684.442944] blk_update_request: I/O error, dev nvme0n1, sector 673580352
>>>> Feb 21 19:40:35 ip kernel: [232684.443163] nvme 0000:03:00.0: Cancelling I/O 912 QID 18
>>>> Feb 21 19:40:35 ip kernel: [232684.443168] nvme 0000:03:00.0: completing aborted command with status: 0007
>>>> Feb 21 19:40:35 ip kernel: [232684.443171] blk_update_request: I/O error, dev nvme0n1, sector 429184832
>>>> Feb 21 19:40:35 ip kernel: [232684.443384] nvme 0000:03:00.0: Cancelling I/O 913 QID 18
>>>> Feb 21 19:40:35 ip kernel: [232684.443388] blk_update_request: I/O error, dev nvme0n1, sector 1680904848
>>>> Feb 21 19:40:35 ip kernel: [232684.443607] nvme 0000:03:00.0: Cancelling I/O 914 QID 18
>>>> Feb 21 19:40:35 ip kernel: [232684.443610] blk_update_request: I/O error, dev nvme0n1, sector 349779760
>>>> Feb 21 19:40:35 ip kernel: [232684.443823] nvme 0000:03:00.0: Cancelling I/O 915 QID 18
>>>> Feb 21 19:40:35 ip kernel: [232684.443826] blk_update_request: I/O error, dev nvme0n1, sector 349798704
>>>> Feb 21 19:40:35 ip kernel: [232684.444043] nvme 0000:03:00.0: Cancelling I/O 860 QID 9
>>>> Feb 21 19:40:35 ip kernel: [232684.444047] nvme 0000:03:00.0: completing aborted command with status: 0007
>>>> Feb 21 19:40:35 ip kernel: [232684.444050] blk_update_request: I/O error, dev nvme0n1, sector 1051102760
>>>> Feb 21 19:40:35 ip kernel: [232684.444273] nvme 0000:03:00.0: Cancelling I/O 943 QID 7
>>>> Feb 21 19:40:35 ip kernel: [232684.444281] nvme 0000:03:00.0: Cancelling I/O 442 QID 2
>>>> Feb 21 19:40:35 ip kernel: [232684.444287] nvme 0000:03:00.0: completing aborted command with status: 0007
>>>> Feb 21 19:40:35 ip kernel: [232684.444291] blk_update_request: I/O error, dev nvme0n1, sector 340275160
>>>> Feb 21 19:40:35 ip kernel: [232684.444514] nvme 0000:03:00.0: Cancelling I/O 0 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444520] nvme 0000:03:00.0: Cancelling I/O 1 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444528] nvme nvme0: Abort status: 0x7
>>>> Feb 21 19:40:35 ip kernel: [232684.444534] nvme 0000:03:00.0: Cancelling I/O 2 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444540] nvme nvme0: Abort status: 0x7
>>>> Feb 21 19:40:35 ip kernel: [232684.444546] nvme 0000:03:00.0: Cancelling I/O 3 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444552] nvme nvme0: Abort status: 0x7
>>>> Feb 21 19:40:35 ip kernel: [232684.444558] nvme 0000:03:00.0: Cancelling I/O 4 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444563] nvme nvme0: Abort status: 0x7
>>>> Feb 21 19:40:35 ip kernel: [232684.444570] nvme 0000:03:00.0: Cancelling I/O 5 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444575] nvme nvme0: Abort status: 0x7
>>>> Feb 21 19:40:35 ip kernel: [232684.444581] nvme 0000:03:00.0: Cancelling I/O 6 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444587] nvme 0000:03:00.0: Cancelling I/O 7 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444594] nvme 0000:03:00.0: Cancelling I/O 8 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444600] nvme 0000:03:00.0: Cancelling I/O 9 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444606] nvme 0000:03:00.0: Cancelling I/O 10 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444613] nvme 0000:03:00.0: Cancelling I/O 11 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444619] nvme 0000:03:00.0: Cancelling I/O 12 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444625] nvme 0000:03:00.0: Cancelling I/O 13 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444631] nvme 0000:03:00.0: Cancelling I/O 14 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444636] nvme 0000:03:00.0: Cancelling I/O 15 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444640] nvme 0000:03:00.0: Cancelling I/O 16 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444644] nvme 0000:03:00.0: Cancelling I/O 17 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444648] nvme 0000:03:00.0: Cancelling I/O 18 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444653] nvme 0000:03:00.0: Cancelling I/O 19 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444657] nvme 0000:03:00.0: Cancelling I/O 20 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444661] nvme 0000:03:00.0: Cancelling I/O 21 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444665] nvme 0000:03:00.0: Cancelling I/O 22 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444669] nvme 0000:03:00.0: Cancelling I/O 23 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444673] nvme 0000:03:00.0: Cancelling I/O 24 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444677] nvme 0000:03:00.0: Cancelling I/O 25 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444680] nvme 0000:03:00.0: Cancelling I/O 26 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444684] nvme 0000:03:00.0: Cancelling I/O 27 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444688] nvme 0000:03:00.0: Cancelling I/O 28 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444692] nvme 0000:03:00.0: Cancelling I/O 29 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444696] nvme 0000:03:00.0: Cancelling I/O 30 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444700] nvme 0000:03:00.0: Cancelling I/O 31 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444704] nvme 0000:03:00.0: Cancelling I/O 32 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444708] nvme 0000:03:00.0: Cancelling I/O 33 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444712] nvme 0000:03:00.0: Cancelling I/O 34 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444716] nvme 0000:03:00.0: Cancelling I/O 35 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444720] nvme 0000:03:00.0: Cancelling I/O 36 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444724] nvme 0000:03:00.0: Cancelling I/O 37 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444753] nvme 0000:03:00.0: Cancelling I/O 943 QID 7
>>>> Feb 21 19:40:35 ip kernel: [232684.444758] nvme 0000:03:00.0: Cancelling I/O 0 QID 0
>>>> Feb 21 19:40:35 ip kernel: [232684.444762] nvme nvme0: Abort status: 0xfffc
>>>> Feb 21 19:40:35 ip kernel: [232684.444784] nvme 0000:03:00.0: completing aborted command with status: fffffffc
>>>> Feb 21 19:40:35 ip kernel: [232684.444792] blk_update_request: I/O error, dev nvme0n1, sector 694864
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: 2021-02-21 19:40:35.211 7f6fa71de700 -1 bdev(0xbb4ea80 /var/lib/ceph/osd/ceph-8/block.wal) _aio_thread got r=-5 ((5) Input/output error)
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7f6f899a3700 time 2021-02-21 19:40:35.216629
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: 2021-02-21 19:40:35.211 7f6f899a3700 -1 bdev(0xbb4e700 /var/lib/ceph/osd/ceph-8/block.db) direct_read_unaligned stalled read  0x54d0c509d~11b5 since 232536.575882s, timeout is 5.000000s
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: 2021-02-21 19:40:35.211 7f6f899a3700 -1 bdev(0xbb4e700 /var/lib/ceph/osd/ceph-8/block.db) direct_read_unaligned 0x54d0c509d~11b5 error: (5) Input/output error
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f6fa71de700 time 2021-02-21 19:40:35.215716
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!")
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x83eb8c]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  2: (KernelDevice::_aio_thread()+0xbc5) [0xe8d5b5]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0xe8f3cd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  4: (()+0x76ba) [0x7f6fb567c6ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  5: (clone()+0x6d) [0x7f6fb4c834dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: 2021-02-21 19:40:35.215 7f6fa71de700 -1 /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f6fa71de700 time 2021-02-21 19:40:35.215716
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!")
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf34f84700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned stalled read  0xb299e8f61~1182 since 232531.455725s, timeout is 5.000000s
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf34f84700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned 0xb299e8f61~1182 error: (5) Input/output error
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7fcf34f84700 time 2021-02-21 19:40:35.215957
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7fcf35f86700 time 2021-02-21 19:40:35.216371
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7fcf35f86700 time 2021-02-21 19:40:35.216371
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf31f7e700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned stalled read  0x1b25e6102~ff3 since 232572.688747s, timeout is 5.000000s
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf35f86700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned stalled read  0x1b2f26bd6~1015 since 232573.917283s, timeout is 5.000000s
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf31f7e700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned 0x1b25e6102~ff3 error: (5) Input/output error
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf35f86700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned 0x1b2f26bd6~1015 error: (5) Input/output error
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf3077b700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned stalled read  0x904fbb6a~101d since 232531.225342s, timeout is 5.000000s
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f0236585700 time 2021-02-21 19:40:35.215482
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!")
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x83eb8c]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  2: (KernelDevice::_aio_thread()+0xbc5) [0xe8d5b5]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0xe8f3cd]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  4: (()+0x76ba) [0x7f0244a236ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  5: (clone()+0x6d) [0x7f024402a4dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: *** Caught signal (Aborted) **
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  in thread 7f0236585700 thread_name:bstore_aio
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: 2021-02-21 19:40:35.215 7f0236585700 -1 /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f0236585700 time 2021-02-21 19:40:35.215482
>>>> Feb 21 19:40:35 ip ceph-osd[3241]: /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!")
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x83eb8c]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  2: (KernelDevice::_aio_thread()+0xbc5) [0xe8d5b5]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0xe8f3cd]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  4: (()+0x76ba) [0x7f0244a236ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  5: (clone()+0x6d) [0x7f024402a4dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x83eb8c]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  2: (KernelDevice::_aio_thread()+0xbc5) [0xe8d5b5]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  3: (KernelDevice::AioCompletionThread::entry()+0xd) [0xe8f3cd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  4: (()+0x76ba) [0x7f6fb567c6ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  5: (clone()+0x6d) [0x7f6fb4c834dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]: *** Caught signal (Aborted) **
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  in thread 7f6fa71de700 thread_name:bstore_aio
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.211 7fcf3077b700 -1 bdev(0xd4b2700 /var/lib/ceph/osd/ceph-0/block.db) direct_read_unaligned 0x904fbb6a~101d error: (5) Input/output error
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7fcf3077b700 time 2021-02-21 19:40:35.216680
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x83e4e2]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x83e6bd]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  3: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned long, char*)+0xc9b) [0xe3eb8b]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  4: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x20) [0xe6cd60]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  5: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x476) [0x14c6966]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  6: (rocksdb::BlockFetcher::ReadBlockContents()+0x4d3) [0x1485723]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  7: (rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache(rocksdb::FilePrefetchBuffer*, rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool, rocksdb::GetContext*)+0x4a3) [0x1474143]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  8: (rocksdb::DataBlockIter* rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::DataBlockIter*, bool, bool, bool, rocksdb::GetContext*, rocksdb::Status, rocksdb::FilePrefetchBuffer*)+0x303) [0x1481853]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  9: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, bool)+0x4c2) [0x147d342]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  10: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, rocksdb::HistogramImpl*, bool, int)+0x187) [0x13ea3d7]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  11: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, bool*, bool*, unsigned long*, rocksdb::ReadCallback*, bool*)+0x3f4) [0x1405214]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  12: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*, rocksdb::ReadCallback*, bool*)+0x5e1) [0x130e771]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  13: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x26) [0x130ef16]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  14: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0xae) [0x1327eae]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  15: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::v14_2_0::list*)+0x31c) [0x12e623c]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  16: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x1ed) [0xd3337d]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  17: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1cad) [0xd9cd5d]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  18: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2aa) [0xda1b9a]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  19: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0xadfd54]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  20: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xb09) [0xbe0809]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  21: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1a7) [0xbeeb67]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  22: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0xaf3487]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  23: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0xaa2105]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  24: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bf) [0x8c12af]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  25: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0xb7a152]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  26: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbed) [0x8df07d]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  27: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0xee700c]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  28: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xeea1d0]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  29: (()+0x76ba) [0x7fcf6045c6ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  30: (clone()+0x6d) [0x7fcf5fa634dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: *** Caught signal (Aborted) **
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  in thread 7fcf35f86700 thread_name:tp_osd_tp
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: 2021-02-21 19:40:35.235 7fcf35f86700 -1 /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_read_random(BlueFS::FileReader*, uint64_t, size_t, char*)' thread 7fcf3077b700 time 2021-02-21 19:40:35.216680
>>>> Feb 21 19:40:35 ip ceph-osd[3201]: /build/ceph-14.2.10/src/os/bluestore/BlueFS.cc: 1555: FAILED ceph_assert(r == 0)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x83e4e2]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x83e6bd]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  3: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned long, char*)+0xc9b) [0xe3eb8b]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  4: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x20) [0xe6cd60]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  5: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x476) [0x14c6966]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  6: (rocksdb::BlockFetcher::ReadBlockContents()+0x4d3) [0x1485723]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  7: (rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache(rocksdb::FilePrefetchBuffer*, rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool, rocksdb::GetContext*)+0x4a3) [0x1474143]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  8: (rocksdb::DataBlockIter* rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::DataBlockIter*, bool, bool, bool, rocksdb::GetContext*, rocksdb::Status, rocksdb::FilePrefetchBuffer*)+0x303) [0x1481853]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x83e4e2]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x83e6bd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  3: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned long, char*)+0xc9b) [0xe3eb8b]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  4: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x20) [0xe6cd60]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  5: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x476) [0x14c6966]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  6: (rocksdb::BlockFetcher::ReadBlockContents()+0x4d3) [0x1485723]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  7: (rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache(rocksdb::FilePrefetchBuffer*, rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool, rocksdb::GetContext*)+0x4a3) [0x1474143]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  8: (rocksdb::DataBlockIter* rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::DataBlockIter*, bool, bool, bool, rocksdb::GetContext*, rocksdb::Status, rocksdb::FilePrefetchBuffer*)+0x303) [0x1481853]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  9: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, bool)+0x4c2) [0x147d342]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  10: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, rocksdb::HistogramImpl*, bool, int)+0x187) [0x13ea3d7]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  11: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, bool*, bool*, unsigned long*, rocksdb::ReadCallback*, bool*)+0x3f4) [0x1405214]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x152) [0x83e4e2]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x83e6bd]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  3: (BlueFS::_read_random(BlueFS::FileReader*, unsigned long, unsigned long, char*)+0xc9b) [0xe3eb8b]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  4: (BlueRocksRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x20) [0xe6cd60]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  5: (rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const+0x476) [0x14c6966]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  6: (rocksdb::BlockFetcher::ReadBlockContents()+0x4d3) [0x1485723]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  7: (rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache(rocksdb::FilePrefetchBuffer*, rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::BlockBasedTable::CachableEntry<rocksdb::Block>*, bool, rocksdb::GetContext*)+0x4a3) [0x1474143]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  8: (rocksdb::DataBlockIter* rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>(rocksdb::BlockBasedTable::Rep*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::DataBlockIter*, bool, bool, bool, rocksdb::GetContext*, rocksdb::Status, rocksdb::FilePrefetchBuffer*)+0x303) [0x1481853]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  9: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, bool)+0x4c2) [0x147d342]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  10: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, rocksdb::HistogramImpl*, bool, int)+0x187) [0x13ea3d7]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  11: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, bool*, bool*, unsigned long*, rocksdb::ReadCallback*, bool*)+0x3f4) [0x1405214]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  9: (rocksdb::BlockBasedTable::Get(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, bool)+0x4c2) [0x147d342]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  10: (rocksdb::TableCache::Get(rocksdb::ReadOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileMetaData const&, rocksdb::Slice const&, rocksdb::GetContext*, rocksdb::SliceTransform const*, rocksdb::HistogramImpl*, bool, int)+0x187) [0x13ea3d7]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  11: (rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice*, rocksdb::Status*, rocksdb::MergeContext*, unsigned long*, bool*, bool*, unsigned long*, rocksdb::ReadCallback*, bool*)+0x3f4) [0x1405214]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  12: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*, rocksdb::ReadCallback*, bool*)+0x5e1) [0x130e771]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  13: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x26) [0x130ef16]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  14: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0xae) [0x1327eae]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  15: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::v14_2_0::list*)+0x31c) [0x12e623c]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  16: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x1ed) [0xd3337d]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  17: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x1cad) [0xd9cd5d]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  18: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2aa) [0xda1b9a]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  19: (non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x54) [0xadfd54]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  12: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*, rocksdb::ReadCallback*, bool*)+0x5e1) [0x130e771]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  13: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x26) [0x130ef16]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  14: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0xae) [0x1327eae]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  15: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::v14_2_0::list*)+0x31c) [0x12e623c]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  16: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x1ed) [0xd3337d]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  17: (BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, char const*, ceph::buffer::v14_2_0::ptr&)+0xe3) [0xd4f803]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  18: (PGBackend::objects_get_attr(hobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list*)+0xc7) [0xaebc17]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  19: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > > const*)+0x64f) [0xa409bf]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  20: (PrimaryLogPG::find_object_context(hobject_t const&, std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x250) [0xa452d0]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  21: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2c71) [0xa9fbb1]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  22: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd08) [0xaa2708]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  12: (rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, bool*, rocksdb::ReadCallback*, bool*)+0x5e1) [0x130e771]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  13: (rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*)+0x26) [0x130ef16]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  14: (rocksdb::DB::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)+0xae) [0x1327eae]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  15: (RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::v14_2_0::list*)+0x31c) [0x12e623c]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  16: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x1ed) [0xd3337d]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  17: (BlueStore::getattr(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, char const*, ceph::buffer::v14_2_0::ptr&)+0xe3) [0xd4f803]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  18: (PGBackend::objects_get_attr(hobject_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v14_2_0::list*)+0xc7) [0xaebc17]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  19: (PrimaryLogPG::get_object_context(hobject_t const&, bool, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v14_2_0::list, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v14_2_0::list> > > const*)+0x64f) [0xa409bf]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  20: (PrimaryLogPG::find_object_context(hobject_t const&, std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x250) [0xa452d0]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  21: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2c71) [0xa9fbb1]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  22: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xd08) [0xaa2708]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  20: (ReplicatedBackend::do_repop(boost::intrusive_ptr<OpRequest>)+0xb09) [0xbe0809]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  21: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x1a7) [0xbeeb67]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  22: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x97) [0xaf3487]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  23: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0xaa2105]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  24: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bf) [0x8c12af]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  25: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0xb7a152]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  26: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbed) [0x8df07d]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  27: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0xee700c]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  28: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xeea1d0]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  29: (()+0x76ba) [0x7fcf6045c6ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3201]:  30: (clone()+0x6d) [0x7fcf5fa634dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  23: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bf) [0x8c12af]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  24: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0xb7a152]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  25: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbed) [0x8df07d]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  26: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0xee700c]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  27: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xeea1d0]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  28: (()+0x76ba) [0x7f6fb567c6ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3228]:  29: (clone()+0x6d) [0x7f6fb4c834dd]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  23: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bf) [0x8c12af]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  24: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x62) [0xb7a152]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  25: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xbed) [0x8df07d]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  26: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x4ac) [0xee700c]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  27: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xeea1d0]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  28: (()+0x76ba) [0x7f0244a236ba]
>>>> Feb 21 19:40:35 ip ceph-osd[3241]:  29: (clone()+0x6d) [0x7f024402a4dd]
>>>> 
>>>> 
>>>> From /var/log/ceph/ceph-osd-16.log
>>>> ** File Read Latency Histogram By Level [default] **
>>>> 
>>>> 2021-02-21 19:38:36.218 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:36.218 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:37.254 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:37.254 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:38.286 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:38.286 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:39.334 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:39.334 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:40.322 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:40.322 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:41.326 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:41.326 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:42.298 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:42.298 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:43.258 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:43.258 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 1 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:44.258 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:44.258 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_repop(mds.3.49311:354864012 2.be e7868/7857 2:7d8b5a59:::203.00123c76:head v 7868'5766635) initiated 2021-02-21 19:38:13.743464 currently started
>>>> 2021-02-21 19:38:44.258 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:45.286 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:45.286 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_repop(mds.3.49311:354864012 2.be e7868/7857 2:7d8b5a59:::203.00123c76:head v 7868'5766635) initiated 2021-02-21 19:38:13.743464 currently started
>>>> 2021-02-21 19:38:45.286 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:46.254 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:46.254 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_repop(mds.3.49311:354864012 2.be e7868/7857 2:7d8b5a59:::203.00123c76:head v 7868'5766635) initiated 2021-02-21 19:38:13.743464 currently started
>>>> 2021-02-21 19:38:46.254 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:38:47.226 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868) initiated 2021-02-21 19:38:05.293724 currently waiting for sub ops
>>>> 2021-02-21 19:38:47.226 7f023b58f700  0 log_channel(cluster) log [WRN] : slow request osd_repop(mds.3.49311:354864012 2.be e7868/7857 2:7d8b5a59:::203.00123c76:head v 7868'5766635) initiated 2021-02-21 19:38:13.743464 currently started
>>>> 2021-02-21 19:38:47.226 7f023b58f700 -1 osd.16 7868 get_health_metrics reporting 2 slow ops, oldest is osd_op(mds.1.51327:1034954064 2.80 2:018429c8:::2002458b0ca.00000000:head [create,setxattr parent (357),setxattr layout (30)] snapc 0=[] ondisk+write+known_if_redirected+full_force e7868)
>>>> 2021-02-21 19:39:01.150 7f0241749700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f021a54d700' had timed out after 15
>>>> 2021-02-21 19:39:01.150 7f0241f4a700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f021a54d700' had timed out after 15
>>>> 2021-02-21 19:39:01.330 7f0241f4a700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f021a54d700' had timed out after 15
>>>> 2021-02-21 19:39:01.330 7f0240f48700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f021a54d700' had timed out after 15
>>>> ...
>>>> 2021-02-21 19:40:35.099 7f0240f48700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0216d46700' had timed out after 15
>>>> 2021-02-21 19:40:35.099 7f0240f48700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f021a54d700' had timed out after 15
>>>> 2021-02-21 19:40:35.211 7f0236585700 -1 bdev(0xc3f0a80 /var/lib/ceph/osd/ceph-16/block.wal) _aio_thread got r=-5 ((5) Input/output error)
>>>> 2021-02-21 19:40:35.211 7f021a54d700 -1 bdev(0xc3f0700 /var/lib/ceph/osd/ceph-16/block.db) direct_read_unaligned stalled read  0x611527d7~fe6 since 232570.311299s, timeout is 5.000000s
>>>> 2021-02-21 19:40:35.211 7f021a54d700 -1 bdev(0xc3f0700 /var/lib/ceph/osd/ceph-16/block.db) direct_read_unaligned 0x611527d7~fe6 error: (5) Input/output error
>>>> 2021-02-21 19:40:35.215 7f0236585700 -1 /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f0236585700 time 2021-02-21 19:40:35.215482
>>>> /build/ceph-14.2.10/src/os/bluestore/KernelDevice.cc: 534: ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!")
>>>> 
>>>> ceph version 14.2.10 (b340acf629a010a74d90da5782a2c5fe0b54ac20) nautilus (stable)
>>>> 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x83eb8c]
>>>> 2: (KernelDevice::_aio_thread()+0xbc5) [0xe8d5b5]
>>>> 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0xe8f3cd]
>>>> 4: (()+0x76ba) [0x7f0244a236ba]
>>>> 5: (clone()+0x6d) [0x7f024402a4dd]
>>>> 
>>>> Thanks,
>>>> Xiong
>>>> 
>>>>> 在 2021年2月19日,下午10:34,zxcs <zhuxiongcs@xxxxxxx> 写道:
>>>>> 
>>>>> you mean OS? it ubuntu 16.04 and Nvme is Samsung 970 PRO 1TB.
>>>>> 
>>>>> Thanks,
>>>>> zx
>>>>> 
>>>>>> 在 2021年2月19日,下午6:56,Konstantin Shalygin <k0ste@xxxxxxxx <mailto:k0ste@xxxxxxxx> <mailto:k0ste@xxxxxxxx <mailto:k0ste@xxxxxxxx>>> 写道:
>>>>>> 
>>>>>> Look's good, what is your hardware? Server model & NVM'es?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> k
>>>>>> 
>>>>>>> On 19 Feb 2021, at 13:22, zxcs <zhuxiongcs@xxxxxxx <mailto:zhuxiongcs@xxxxxxx> <mailto:zhuxiongcs@xxxxxxx <mailto:zhuxiongcs@xxxxxxx>>> wrote:
>>>>>>> 
>>>>>>> BTW, actually i have two nodes has same issues, and another error node's nvme output as below
>>>>>>> 
>>>>>>> Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
>>>>>>> critical_warning                    : 0
>>>>>>> temperature                         : 29 C
>>>>>>> available_spare                     : 100%
>>>>>>> available_spare_threshold           : 10%
>>>>>>> percentage_used                     : 1%
>>>>>>> data_units_read                     : 592,340,175
>>>>>>> data_units_written                  : 26,443,352
>>>>>>> host_read_commands                  : 5,341,278,662
>>>>>>> host_write_commands                 : 515,730,885
>>>>>>> controller_busy_time                : 14,052
>>>>>>> power_cycles                        : 8
>>>>>>> power_on_hours                      : 4,294
>>>>>>> unsafe_shutdowns                    : 6
>>>>>>> media_errors                        : 0
>>>>>>> num_err_log_entries                 : 0
>>>>>>> Warning Temperature Time            : 0
>>>>>>> Critical Composite Temperature Time : 0
>>>>>>> Temperature Sensor 1                : 29 C
>>>>>>> Temperature Sensor 2                : 46 C
>>>>>>> Temperature Sensor 3                : 0 C
>>>>>>> Temperature Sensor 4                : 0 C
>>>>>>> Temperature Sensor 5                : 0 C
>>>>>>> Temperature Sensor 6                : 0 C
>>>>>>> Temperature Sensor 7                : 0 C
>>>>>>> Temperature Sensor 8                : 0 C
>>>>>>> 
>>>>>>> 
>>>>>>> For compare, i get one healthy node’s nvme output as below:
>>>>>>> 
>>>>>>> mart Log for NVME device:nvme0n1 namespace-id:ffffffff
>>>>>>> critical_warning                    : 0
>>>>>>> temperature                         : 27 C
>>>>>>> available_spare                     : 100%
>>>>>>> available_spare_threshold           : 10%
>>>>>>> percentage_used                     : 1%
>>>>>>> data_units_read                     : 579,829,652
>>>>>>> data_units_written                  : 28,271,336
>>>>>>> host_read_commands                  : 5,237,750,233
>>>>>>> host_write_commands                 : 518,979,861
>>>>>>> controller_busy_time                : 14,166
>>>>>>> power_cycles                        : 3
>>>>>>> power_on_hours                      : 4,252
>>>>>>> unsafe_shutdowns                    : 1
>>>>>>> media_errors                        : 0
>>>>>>> num_err_log_entries                 : 0
>>>>>>> Warning Temperature Time            : 0
>>>>>>> Critical Composite Temperature Time : 0
>>>>>>> Temperature Sensor 1                : 27 C
>>>>>>> Temperature Sensor 2                : 39 C
>>>>>>> Temperature Sensor 3                : 0 C
>>>>>>> Temperature Sensor 4                : 0 C
>>>>>>> Temperature Sensor 5                : 0 C
>>>>>>> Temperature Sensor 6                : 0 C
>>>>>>> Temperature Sensor 7                : 0 C
>>>>>>> Temperature Sensor 8                : 0 C
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx>
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux