could you please set debug bluestore to 20 and collect startup
log for this specific OSD once again?
On 7/9/2019 6:29 PM, Brett Chancellor
wrote:
I restarted most of the OSDs with the stupid
allocator (6 of them wouldn't start unless bitmap allocator was
set), but I'm still seeing issues with OSDs crashing.
Interestingly it seems that the dying OSDs are always working on
a pg from the .rgw.meta pool when they crash.
Hi Brett,
in Nautilus you can do that via
ceph config set osd.N bluestore_allocator stupid
ceph config set osd.N bluefs_allocator stupid
See https://ceph.com/community/new-mimic-centralized-configuration-management/
for more details on a new way of configuration options
setting.
A known issue with Stupid allocator is gradual write
request latency increase (occurred within several days
after OSD restart). Seldom observed though. There were
some posts about that behavior in the mail list this
year.
Thanks,
Igor.
On 7/8/2019 8:33 PM, Brett Chancellor wrote:
I'll give that a try. Is it something
like...
ceph tell 'osd.*' bluestore_allocator stupid
ceph tell 'osd.*' bluefs_allocator stupid
And should I expect any issues doing this?
I should read call stack more carefully... It's
not about lacking free space - this is rather the
bug from this ticket:
http://tracker.ceph.com/issues/40080
You should upgrade to v14.2.2 (once it's
available) or temporarily switch to stupid
allocator as a workaround.
Thanks,
Igor
On
7/8/2019 8:00 PM, Igor Fedotov wrote:
Hi Brett,
looks like BlueStore is unable to allocate
additional space for BlueFS at main device. It's
either lacking free space or it's too
fragmented...
Would you share osd log, please?
Also please run "ceph-bluestore-tool --path
<substitute with path-to-osd!!!>
bluefs-bdev-sizes" and share the output.
Thanks,
Igor
On
7/3/2019 9:59 PM, Brett Chancellor wrote:
Hi All! Today I've had 3 OSDs
stop themselves and are unable to restart, all
with the same error. These OSDs are all on
different hosts. All are running 14.2.1
I did try the following two commands
- ceph-kvstore-tool bluestore-kv
/var/lib/ceph/osd/ceph-80 list > keys
## This failed with the same error
below
- ceph-bluestore-tool --path
/var/lib/ceph/osd/ceph-80 fsck
## After a couple of hours returned...
2019-07-03 18:30:02.095 7fe7c1c1ef00 -1
bluestore(/var/lib/ceph/osd/ceph-80) fsck
warning: legacy statfs record found, suggest
to run store repair to get consistent
statistic reports
fsck success
## Error when trying to start one of
the OSDs
-12>
2019-07-03 18:36:57.450 7f5e42366700 -1
*** Caught signal (Aborted) **
in thread 7f5e42366700
thread_name:rocksdb:low0
ceph version 14.2.1
(d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)
1: (()+0xf5d0) [0x7f5e50bd75d0]
2: (gsignal()+0x37) [0x7f5e4f9ce207]
3: (abort()+0x148) [0x7f5e4f9cf8f8]
4: (ceph::__ceph_assert_fail(char
const*, char const*, int, char
const*)+0x199) [0x55a7aaee96ab]
5: (ceph::__ceph_assertf_fail(char
const*, char const*, int, char const*,
char const*, ...)+0) [0x55a7aaee982a]
6: (interval_set<unsigned long,
std::map<unsigned long, unsigned
long, std::less<unsigned long>,
std::allocator<std::pair<unsigned
long const, unsigned long> > >
>::insert(unsigned long, unsigned
long, unsigned long*, unsigned
long*)+0x3c6) [0x55a7ab212a66]
7:
(BlueStore::allocate_bluefs_freespace(unsigned
long, unsigned long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
bluestore_pextent_t> >*)+0x74e)
[0x55a7ab48253e]
8:
(BlueFS::_expand_slow_device(unsigned
long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
bluestore_pextent_t>
>&)+0x111) [0x55a7ab59e921]
9: (BlueFS::_allocate(unsigned char,
unsigned long, bluefs_fnode_t*)+0x68b)
[0x55a7ab59f68b]
10:
(BlueFS::_flush_range(BlueFS::FileWriter*,
unsigned long, unsigned long)+0xe5)
[0x55a7ab59fce5]
11:
(BlueFS::_flush(BlueFS::FileWriter*,
bool)+0x10b) [0x55a7ab5a1b4b]
12:
(BlueRocksWritableFile::Flush()+0x3d)
[0x55a7ab5bf84d]
13:
(rocksdb::WritableFileWriter::Flush()+0x19e)
[0x55a7abbedd0e]
14:
(rocksdb::WritableFileWriter::Sync(bool)+0x2e)
[0x55a7abbedfee]
15:
(rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
const&,
rocksdb::CompactionJob::SubcompactionState*,
rocksdb::RangeDelAggregator*,
CompactionIterationStats*,
rocksdb::Slice const*)+0xbaa)
[0x55a7abc3b73a]
16:
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0)
[0x55a7abc3f150]
17:
(rocksdb::CompactionJob::Run()+0x298)
[0x55a7abc40618]
18:
(rocksdb::DBImpl::BackgroundCompaction(bool*,
rocksdb::JobContext*,
rocksdb::LogBuffer*,
rocksdb::DBImpl::PrepickedCompaction*)+0xcb7)
[0x55a7aba7fb67]
19:
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
rocksdb::Env::Priority)+0xd0)
[0x55a7aba813c0]
20:
(rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a)
[0x55a7aba8190a]
21:
(rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned
long)+0x264) [0x55a7abc8d9c4]
22:
(rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
[0x55a7abc8db4f]
23: (()+0x129dfff) [0x55a7abd1afff]
24: (()+0x7dd5) [0x7f5e50bcfdd5]
25: (clone()+0x6d) [0x7f5e4fa95ead]
NOTE: a copy of the executable, or
`objdump -rdS <executable>` is
needed to interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|