Re: Bluestore Bitmap allocator crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Somnath & Ramesh,

Speaking of allocator failures, dmick noticed tonight that we are failing make check specifically in unittest_bit_alloc. We both did bisects and came to the same conclusion that it first appeared with the merge this morning of:

https://github.com/ceph/ceph/pull/10257

Ultimately the assert we are failing is here:

https://github.com/ceph/ceph/blob/c98ced1d5ae3d3709d0cd38c5b075b1b2c458a74/src/os/bluestore/BitAllocator.cc#L1518

I've been digging in tonight with gdb.  Here's the relevant parts of the bt:

#2 0x00007ffff3712566 in __assert_fail_base (fmt=0x7ffff3862288 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x60ffe0 "start_block + num_blocks <= size()", file=file@entry=0x60fea8 "/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc", line=line@entry=1518, function=function@entry=0x610300 <BitAllocator::set_blocks_used(long, long)::__PRETTY_FUNCTION__> "virtual void BitAllocator::set_blocks_used(int64_t, int64_t)") at assert.c:92 #3 0x00007ffff3712612 in __GI___assert_fail (assertion=0x60ffe0 "start_block + num_blocks <= size()", file=0x60fea8 "/home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc", line=1518, function=0x610300 <BitAllocator::set_blocks_used(long, long)::__PRETTY_FUNCTION__> "virtual void BitAllocator::set_blocks_used(int64_t, int64_t)") at assert.c:101 #4 0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0, start_block=1035, num_blocks=501) at /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518 #5 0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0, total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223 #6 0x00000000004edbcd in BitAllocator_test_bmap_alloc_Test::TestBody (this=<optimized out>) at /home/ubuntu/src/markhpc/ceph/src/test/objectstore/BitAllocator_test.cc:448

We can see that in frame 5 the total_blocks is 1035 and zone_size_block is 512:

#5 0x000000000051d0a1 in BitAllocator::BitAllocator (this=0x8b0e8a0, total_blocks=1035, zone_size_block=512, mode=CONCURRENT) at /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1223
1223	  init_check(total_blocks, zone_size_block, mode, false, false);

While in frame 4 the start_block is 1035 and the num_blocks is 501, and size() is 1035, which is why the assert is failing.

(gdb) frame 4
#4 0x000000000051d328 in BitAllocator::set_blocks_used (this=0x8b0e8a0, start_block=1035, num_blocks=501) at /home/ubuntu/src/markhpc/ceph/src/os/bluestore/BitAllocator.cc:1518
1518	  debug_assert(start_block + num_blocks <= size());
(gdb) print size()
$19 = 1035

Ultimately the code in BitAllocator::init_check seems like it might be relevant. I haven't dug in enough to understand what's going on in there, but it seems like we must be hitting a corner case given that the start_block is the same as size().

Mark

On 07/20/2016 08:44 PM, Somnath Roy wrote:
Ramesh,
I am hitting the following crash during IO path the moment I started io.

os/bluestore/BitMapAllocator.cc: 76: FAILED assert(!(off % m_block_size))

 ceph version 11.0.0-696-ga3438ba (a3438bac71a54cb43e5feb93ad09228bf69942ae)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x55804242de40]
 2: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x2e3) [0x558042132813]
 3: (BitMapAllocator::commit_finish()+0x2a5) [0x558042132e55]
 4: (BlueStore::_kv_sync_thread()+0x142d) [0x558041fff61d]
 5: (BlueStore::KVSyncThread::entry()+0xd) [0x558042028c2d]
 6: (Thread::entry_wrapper()+0x75) [0x55804240d755]
 7: (()+0x76fa) [0x7f36699076fa]
 8: (clone()+0x6d) [0x7f3667767b5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


What I did :
------------

1. I have a separate wal partition to a nvram device

2. change min_alloc_size to 16K

3. ran 4k rw.

Let me know if you need further details.

Thanks & Regards
Somnath



PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux