bluestore assertion happening mostly on cachetier SSDs with external WAL/DB nvme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

This list has been so great a resource for myself in the past few years using ceph that first off I want to say thanks. This is the first I've needed to post but have gained a ton of insight/experience reading responses from you helpful people.

We've been running Luminous for about a month now on a new cluster doing just object storage and are in the process of rolling out bluestore to some of the existing cluster filestore backed OSDs. Anyhow, we've seen a few issues with crashes on our cachetier bluestore OSDs which makes them get into a bad state where I'm unable to bring them back. Early on we had some crashes due to the jemalloc bug which I had thought may be related but there have recently been a few OSDs that got into this state that never had jemalloc (I forgot to add it in our chef cookbook that deploys these nodes)

I filed a bug here, but in the meantime I was curious if anyone else has run into this error. This is 12.2.1 / ubuntu xenial

http://tracker.ceph.com/issues/22066

2017-11-07 05:07:40.193162 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.818396 seconds old, received at 2017-11-07 05:07:09.374162: osd_op(client.32411500.0:704920 16.4 16:20c9bc55:::.dir.82b78eeb-9530-43f7-92a8-a79b5c
df1e04.21552069.1.261:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 36
2017-11-07 05:07:40.672879 7f5695e68700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/db_impl_write.cc:684] reusing log 57888 from recycle list

2017-11-07 05:07:40.672990 7f5695e68700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/db_impl_write.cc:725] [default] New memtable created with log file: #58654. Immutable memtables: 0.

2017-11-07 05:07:40.673110 7f5695e68700 1 bluefs _allocate failed to allocate 0x100000 on bdev 0, free 0xff000; fallback to bdev 1
2017-11-07 05:07:40.911889 7f567e639700 4 rocksdb: (Original Log Time 2017/11/07-05:07:40.678363) [/build/ceph-12.2.1/src/rocksdb/db/db_impl_compaction_flush.cc:1158] Calling FlushMemTableToOutputFile with column family [default], flush 
slots available 1, compaction slots allowed 1, compaction slots scheduled 1
2017-11-07 05:07:40.911954 7f567e639700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/flush_job.cc:264] [default] [JOB 10082] Flushing memtable with next log file: 58654

2017-11-07 05:07:40.912077 7f567e639700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1510031260911964, "job": 10082, "event": "flush_started", "num_memtables": 1, "num_entries": 17207993, "num_deletes": 16854639, "memory_usage": 3244877424}
2017-11-07 05:07:40.912083 7f567e639700 4 rocksdb: [/build/ceph-12.2.1/src/rocksdb/db/flush_job.cc:293] [default] [JOB 10082] Level-0 flush table #58655: started
2017-11-07 05:07:41.193369 7f569f67b700 0 log_channel(cluster) log [WRN] : 6 slow requests, 5 included below; oldest blocked for > 36.431733 secs
2017-11-07 05:07:41.193380 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.801090 seconds old, received at 2017-11-07 05:07:10.392216: osd_op(client.32814103.0:158479 16.8 16:10f90c19:::.dir.82b78eeb-9530-43f7-92a8-a79b5c
df1e04.21552069.1.132:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 36
2017-11-07 05:07:41.193403 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 36.431733 seconds old, received at 2017-11-07 05:07:04.761574: osd_op(client.32815468.0:1795982 16.4 16:2055d324:::.dir.82b78eeb-9530-43f7-92a8-a79b5
cdf1e04.21723796.1.98:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 36
2017-11-07 05:07:41.193406 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 32.197643 seconds old, received at 2017-11-07 05:07:08.995664: osd_op(client.32324126.0:723025 16.5b 16:dbf7165f:::.dir.82b78eeb-9530-43f7-92a8-a79b5
cdf1e04.21552069.1.1021:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 35
2017-11-07 05:07:41.193410 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 31.876367 seconds old, received at 2017-11-07 05:07:09.316940: osd_op(client.32411449.0:1647283 16.5b 16:dab27a8b:::.dir.82b78eeb-9530-43f7-92a8-a79b5cdf1e04.21552069.1.452:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 35
2017-11-07 05:07:41.193414 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.220324 seconds old, received at 2017-11-07 05:07:10.972983: osd_op(client.32661681.0:159961 16.5f 16:fbe053dc:::.dir.82b78eeb-9530-43f7-92a8-a79b5cdf1e04.21552069.1.1016:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e52947) currently sub_op_commit_rec from 30
2017-11-07 05:07:46.194177 7f569f67b700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.814295 secs
2017-11-07 05:07:46.194186 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.814295 seconds old, received at 2017-11-07 05:07:15.379818: osd_op(client.32814103.0:158503 25.377 25:eedc0af4:::82b78eeb-9530-43f7-92a8-a79b5cdf1e04.1544167.4_melon_moderation%2f998489968764662.LKPUWVcpM1yjdrdp2FNEpHmLjLfCccn2Ho6MBuBzLdVzcXdDrYhjtJgSo8yvv9XJXtYxtKE8bmvzRkFZEdWsaGitujN9MsPSHH5s28GAvRaNcoqvKTCnWswv2mFh4uA1PhZ.9884577.jpg:head [create,setxattr user.rgw.idtag (53),setxattr user.rgw.tail_tag (53),writefull 0~13797,setxattr user.rgw.manifest (609),setxattr user.rgw.acl (201),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-acl (8),setxattr user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (17),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e52949) currently waiting for blocked object
2017-11-07 05:07:47.194495 7f569f67b700 0 log_channel(cluster) log [WRN] : 2 slow requests, 1 included below; oldest blocked for > 31.814620 secs
2017-11-07 05:07:47.194601 7f569f67b700 0 log_channel(cluster) log [WRN] : slow request 30.749378 seconds old, received at 2017-11-07 05:07:16.445060: osd_op(client.32324117.0:1357608 25.1af 25:f5963916:::82b78eeb-9530-43f7-92a8-a79b5cdf1e04.1544167.4_melon_moderation%2f998489968763621.m7oZhQcTjnhyAL82D9Bd6tCVLmpKhmYXNMZNpvWz8sBc4QuHEeK1xvbxp9JV45SfZHsN1iF7d55JsEtUwZjv9wLW28H4n5QhBjanATGbLFBGqky2xyrpf3Bv48Dm7C2QTYA.10184398.jpg:head [create,setxattr user.rgw.idtag (54),setxattr user.rgw.tail_tag (54),writefull 0~24398,setxattr user.rgw.manifest (610),setxattr user.rgw.acl (201),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (33),setxattr user.rgw.x-amz-acl (8),setxattr user.rgw.x-amz-content-sha256 (65),setxattr user.rgw.x-amz-date (17),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e52949) currently waiting for blocked object
2017-11-07 05:07:57.314405 7f567e639700 -1 /build/ceph-12.2.1/src/include/buffer.h: In function 'void ceph::buffer::list::prepare_iov(VectorT*) const [with VectorT = boost::container::small_vector<iovec, 4ul>]' thread 7f567e639700 time 2017-11-07 05:07:57.235916
/build/ceph-12.2.1/src/include/buffer.h: 882: FAILED assert(_buffers.size() <= 1024)


As mentioned in the bug, it seems similar to http://tracker.ceph.com/issues/21932 however some of these OSDs never crashed until this started happening.

At this point I'm pretty stumped as to how best to move forward. Since this is only happening on our cache tier I have been blowing away the OSDs and re-adding since recovery is fairly quick. As mentioned in the bug I am worried we may see this on the backend OSD tier where recovery is much more disruptive, so I'd like to find a workaround.

Relevant ceph.conf entries:

bluestore default buffered read = true

enable experimental unrecoverable data corrupting features = bluestore rocksdb

bluestore_block_db size = 14336

bluestore_block_wal size = 14336

bluestore_block_size = 0

osd objectstore = bluestore

debug_bluestore = "4/5"

rgw_dynamic_resharding=false

bluestore_cache_size = 0

bluestore_cache_size_hdd = 1073741824

bluestore_cache_size_ssd = 3221225472



Thanks,
E
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux