problems about deep-scrub on Luminous

Songbo Wang <hack.coo@xxxxxxxxx> · Thu, 28 Feb 2019 16:20:46 +0800

Hi guys,

In my production environment, when deep-srub runned on some empty pgs,
the osd will become down. And I found that the ops on rocksdb took too
long time, which will lead the thread become unhealthy. And no ping
request sended to the peer osds. Then the monitor got reports from the
peer osd and mark this osd down.

The following was the log from the primary osd.

 2019-02-27 11:00:00.604029 7f9592f84700 20 osd.92 pg_epoch: 13415
pg[5.1bf( empty local-lis/les=13413/13414 n=0 ec=175/175 lis/c
13413/13413 les/c/f 13414/13414/0 13413/13413/13413) [92,142,111] r=0
lpr=13413 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub state
INACTIVE [MIN,MIN)
2019-02-27 11:00:00.604168 7f9592f84700 20 osd.92 pg_epoch: 13415
pg[5.1bf( empty local-lis/les=13413/13414 n=0 ec=175/175 lis/c
13413/13413 les/c/f 13414/13414/0 13413/13413/13413) [92,142,111] r=0
lpr=13413 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub state
NEW_CHUNK [5:fd800000::::head,MIN)
2019-02-27 11:00:00.604189 7f9592f84700 15
bluestore(/var/lib/ceph/osd/ceph-92) collection_list 5.1bf_head start
#5:fd800000::::head#0 end GHMAX max 5
2019-02-27 11:00:00.604195 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list range
0x7f7ffffffffffffff9fd800000 to 0x7f7ffffffffffffff9fe000000 and
0x7f8000000000000005fd800000 to 0x7f8000000000000005fe000000 start
#5:fd800000::::head#0
2019-02-27 11:00:00.611606 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list pend
0x7f8000000000000005fe000000
2019-02-27 11:00:00.611617 7f9592f84700 30
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list key
0x7f8000000000000005fd80000021213dfffffffffffffffeffffffffffffffff'o'
2019-02-27 11:00:00.611621 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list oid
#5:fd800000::::head# end GHMAX
2019-02-27 11:00:26.142032 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list key
0x7f800000000000001d04c0000021213dfffffffffffffffeffffffffffffffff'o'
>= GHMAX
2019-02-27 11:00:26.142077 7f9592f84700 10
bluestore(/var/lib/ceph/osd/ceph-92) collection_list 5.1bf_head start
GHMAX end GHMAX max 5 = 0, ls.size() = 1, next = GHMAX
2019-02-27 11:00:26.142245 7f9592f84700 20 osd.92 pg_epoch: 13415
pg[5.1bf( empty local-lis/les=13413/13414 n=0 ec=175/175 lis/c
13413/13413 les/c/f 13414/13414/0 13413/13413/13413) [92,142,111] r=0
lpr=13413 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub state
WAIT_PUSHES [5:fd800000::::head,MAX)
2019-02-27 11:00:26.142258 7f9592f84700 20 osd.92 pg_epoch: 13415
pg[5.1bf( empty local-lis/les=13413/13414 n=0 ec=175/175 lis/c
13413/13413 les/c/f 13414/13414/0 13413/13413/13413) [92,142,111] r=0
lpr=13413 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub state
WAIT_LAST_UPDATE [5:fd800000::::head,MAX)
2019-02-27 11:00:26.142267 7f9592f84700 20 osd.92 pg_epoch: 13415
pg[5.1bf( empty local-lis/les=13413/13414 n=0 ec=175/175 lis/c
13413/13413 les/c/f 13414/13414/0 13413/13413/13413) [92,142,111] r=0
lpr=13413 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub state
BUILD_MAP [5:fd800000::::head,MAX)
2019-02-27 11:00:26.142286 7f9592f84700 15
bluestore(/var/lib/ceph/osd/ceph-92) collection_list 5.1bf_head start
#5:fd800000::::head# end #MAX# max 2147483647
2019-02-27 11:00:26.142292 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list range
0x7f7ffffffffffffff9fd800000 to 0x7f7ffffffffffffff9fe000000 and
0x7f8000000000000005fd800000 to 0x7f8000000000000005fe000000 start
#5:fd800000::::head#
2019-02-27 11:00:26.142805 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list pend
0x7f8000000000000005fe000000
2019-02-27 11:00:26.142815 7f9592f84700 30
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list key
0x7f8000000000000005fd80000021213dfffffffffffffffeffffffffffffffff'o'
2019-02-27 11:00:26.142820 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list oid
#5:fd800000::::head# end #MAX#
2019-02-27 11:00:51.796411 7f9592f84700 20
bluestore(/var/lib/ceph/osd/ceph-92) _collection_list key
0x7f800000000000001d04c0000021213dfffffffffffffffeffffffffffffffff'o'
>= #MAX#
2019-02-27 11:00:51.796447 7f9592f84700 10
bluestore(/var/lib/ceph/osd/ceph-92) collection_list 5.1bf_head start
#5:fd800000::::head# end #MAX# max 2147483647 = 0, ls.size() = 1, next
= GHMIN
2019-02-27 11:00:51.796564 7f9592f84700 20 osd.92 pg_epoch: 13415
pg[5.1bf( empty local-lis/les=13413/13414 n=0 ec=175/175 lis/c
13413/13413 les/c/f 13414/13414/0 13413/13413/13413) [92,142,111] r=0
lpr=13413 crt=0'0 mlcod 0'0 active+clean+scrubbing+deep] scrub state
WAIT_REPLICAS [5:fd800000::::head,MAX)

I export the bluefs using ceph-bluestore-tool, and its size is about
5GB which locate on sata-ssd.
I know little about rocksdb and cannot find the real reason. So what
kind of problem about rocksdb ?
Any suggestion, thanks!

best regards.