This will cap single bluefs space allocation. Currently it
attempts to allocate 70Gb which seems to overflow some 32-bit
length fields. With the adjustment such allocation should be
capped at ~700MB.
I doubt there is any relation between this specific failure and
the pool. At least at the moment.
In short the history is: starting OSD tries to flush bluefs data
to disk, detects lack of space and asks for more from main device
- allocations succeeds but returned extent has length field set to
0.
On 7/9/2019 8:33 PM, Brett Chancellor
wrote:
What does bluestore_bluefs_gift_ratio do? I can't
find any documentation on it. Also do you think this could be
related to the .rgw.meta pool having too many objects per PG?
The disks that die always seem to be backfilling a pg from that
pool, and they have ~550k objects per PG.
-Brett
Please try to set bluestore_bluefs_gift_ratio to 0.0002
On
7/9/2019 7:39 PM, Brett Chancellor wrote:
Too large for pastebin.. The problem is
continually crashing new OSDs. Here is the latest one.
could you please set debug bluestore to 20 and
collect startup log for this specific OSD once
again?
On
7/9/2019 6:29 PM, Brett Chancellor wrote:
I restarted most of the OSDs with
the stupid allocator (6 of them wouldn't start
unless bitmap allocator was set), but I'm still
seeing issues with OSDs crashing. Interestingly
it seems that the dying OSDs are always working
on a pg from the .rgw.meta pool when they crash.
Hi Brett,
in Nautilus you can do that via
ceph config set osd.N bluestore_allocator
stupid
ceph config set osd.N bluefs_allocator
stupid
See https://ceph.com/community/new-mimic-centralized-configuration-management/
for more details on a new way of
configuration options setting.
A known issue with Stupid allocator is
gradual write request latency increase
(occurred within several days after OSD
restart). Seldom observed though. There
were some posts about that behavior in the
mail list this year.
Thanks,
Igor.
On 7/8/2019 8:33 PM, Brett Chancellor
wrote:
I'll give that a try. Is
it something like...
ceph tell 'osd.*'
bluestore_allocator stupid
ceph tell 'osd.*'
bluefs_allocator stupid
And should I expect any issues
doing this?
I should read call stack more
carefully... It's not about
lacking free space - this is
rather the bug from this ticket:
http://tracker.ceph.com/issues/40080
You should upgrade to v14.2.2
(once it's available) or
temporarily switch to stupid
allocator as a workaround.
Thanks,
Igor
On
7/8/2019 8:00 PM, Igor Fedotov
wrote:
Hi Brett,
looks like BlueStore is unable
to allocate additional space for
BlueFS at main device. It's
either lacking free space or
it's too fragmented...
Would you share osd log,
please?
Also please run
"ceph-bluestore-tool --path
<substitute with
path-to-osd!!!>
bluefs-bdev-sizes" and share the
output.
Thanks,
Igor
On
7/3/2019 9:59 PM, Brett
Chancellor wrote:
Hi All! Today
I've had 3 OSDs stop
themselves and are unable to
restart, all with the same
error. These OSDs are all on
different hosts. All are
running 14.2.1
I did try the following
two commands
- ceph-kvstore-tool
bluestore-kv
/var/lib/ceph/osd/ceph-80
list > keys
## This failed with the
same error below
- ceph-bluestore-tool
--path
/var/lib/ceph/osd/ceph-80
fsck
## After a couple of
hours returned...
2019-07-03 18:30:02.095
7fe7c1c1ef00 -1
bluestore(/var/lib/ceph/osd/ceph-80)
fsck warning: legacy statfs
record found, suggest to run
store repair to get
consistent statistic reports
fsck success
## Error when trying to
start one of the OSDs
-12>
2019-07-03 18:36:57.450
7f5e42366700 -1 ***
Caught signal (Aborted)
**
in thread 7f5e42366700
thread_name:rocksdb:low0
ceph version 14.2.1
(d555a9489eb35f84f2e1ef49b77e19da9d113972)
nautilus (stable)
1: (()+0xf5d0)
[0x7f5e50bd75d0]
2: (gsignal()+0x37)
[0x7f5e4f9ce207]
3: (abort()+0x148)
[0x7f5e4f9cf8f8]
4:
(ceph::__ceph_assert_fail(char
const*, char const*,
int, char const*)+0x199)
[0x55a7aaee96ab]
5:
(ceph::__ceph_assertf_fail(char
const*, char const*,
int, char const*, char
const*, ...)+0)
[0x55a7aaee982a]
6:
(interval_set<unsigned
long,
std::map<unsigned
long, unsigned long,
std::less<unsigned
long>,
std::allocator<std::pair<unsigned long const, unsigned long>
> >
>::insert(unsigned
long, unsigned long,
unsigned long*, unsigned
long*)+0x3c6)
[0x55a7ab212a66]
7:
(BlueStore::allocate_bluefs_freespace(unsigned
long, unsigned long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
bluestore_pextent_t>
>*)+0x74e)
[0x55a7ab48253e]
8:
(BlueFS::_expand_slow_device(unsigned
long,
std::vector<bluestore_pextent_t,
mempool::pool_allocator<(mempool::pool_index_t)4,
bluestore_pextent_t>
>&)+0x111)
[0x55a7ab59e921]
9:
(BlueFS::_allocate(unsigned
char, unsigned long,
bluefs_fnode_t*)+0x68b)
[0x55a7ab59f68b]
10:
(BlueFS::_flush_range(BlueFS::FileWriter*,
unsigned long, unsigned
long)+0xe5)
[0x55a7ab59fce5]
11:
(BlueFS::_flush(BlueFS::FileWriter*,
bool)+0x10b)
[0x55a7ab5a1b4b]
12:
(BlueRocksWritableFile::Flush()+0x3d)
[0x55a7ab5bf84d]
13:
(rocksdb::WritableFileWriter::Flush()+0x19e)
[0x55a7abbedd0e]
14:
(rocksdb::WritableFileWriter::Sync(bool)+0x2e)
[0x55a7abbedfee]
15:
(rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status
const&,
rocksdb::CompactionJob::SubcompactionState*,
rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice
const*)+0xbaa)
[0x55a7abc3b73a]
16:
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d0)
[0x55a7abc3f150]
17:
(rocksdb::CompactionJob::Run()+0x298)
[0x55a7abc40618]
18:
(rocksdb::DBImpl::BackgroundCompaction(bool*,
rocksdb::JobContext*,
rocksdb::LogBuffer*,
rocksdb::DBImpl::PrepickedCompaction*)+0xcb7)
[0x55a7aba7fb67]
19:
(rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*,
rocksdb::Env::Priority)+0xd0) [0x55a7aba813c0]
20:
(rocksdb::DBImpl::BGWorkCompaction(void*)+0x3a)
[0x55a7aba8190a]
21:
(rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned
long)+0x264)
[0x55a7abc8d9c4]
22:
(rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x4f)
[0x55a7abc8db4f]
23: (()+0x129dfff)
[0x55a7abd1afff]
24: (()+0x7dd5)
[0x7f5e50bcfdd5]
25: (clone()+0x6d)
[0x7f5e4fa95ead]
NOTE: a copy of the
executable, or `objdump
-rdS <executable>`
is needed to interpret
this.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|