Re: replicatedPG assert fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sam,

Thanks for the quick reply. The main modification I made is to call
calc_target within librados::IoCtxImpl::aio_operate before op_submit,
so that I can get all replicated OSDs' id, and send a write op to each
of them. I can also attach the modified code if necessary.

I just reproduced this error with the conf you provided,  please see below:
osd/ReplicatedPG.cc: In function 'int
ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*,
bool, bool, hobject_t*)' thread 7fd6aba59700 time 2016-07-21
15:09:26.431436
osd/ReplicatedPG.cc: 9042: FAILED assert(oid.pool ==
static_cast<int64_t>(info.pgid.pool()))
 ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0x7fd6c5733e8b]
 2: (ReplicatedPG::find_object_context(hobject_t const&,
std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1e54)
[0x7fd6c51ef7c4]
 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7fd6c521fe9e]
 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x73c) [0x7fd6c51dca3c]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5)
[0x7fd6c5094d65]
 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>
const&)+0x5d) [0x7fd6c5094f8d]
 7: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x86c) [0x7fd6c50b603c]
 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947)
[0x7fd6c5724117]
 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fd6c5726270]
 10: (()+0x8184) [0x7fd6c3b98184]
 11: (clone()+0x6d) [0x7fd6c1aa937d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
2016-07-21 15:09:26.454854 7fd6aba59700 -1 osd/ReplicatedPG.cc: In
function 'int ReplicatedPG::find_object_context(const hobject_t&,
ObjectContextRef*, bool, bool, hobject_t*)' thread 7fd6aba59700 time
2016-07-21 15:09:26.431436


This error occurs three times since I wrote to three OSDs.

Thanks,

Sugang

On Thu, Jul 21, 2016 at 10:54 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> Hmm.  Can you provide more information about the poison op?  If you
> can reproduce with
> debug osd = 20
> debug filestore = 20
> debug ms = 1
> it should be easier to work out what is going on.
> -Sam
>
> On Thu, Jul 21, 2016 at 7:13 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote:
>> Hi all,
>>
>> I am working on a research project which requires multiple write
>> operations for the same object at the same time from the client. At
>> the OSD side, I got this error:
>> osd/ReplicatedPG.cc: In function 'int
>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*,
>> bool, bool, hobject_t*)' thread 7f0586193700 time 2016-07-21
>> 14:02:04.218448
>> osd/ReplicatedPG.cc: 9041: FAILED assert(oid.pool ==
>> static_cast<int64_t>(info.pgid.pool()))
>>  ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x8b) [0x7f059fe6dd7b]
>>  2: (ReplicatedPG::find_object_context(hobject_t const&,
>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1dbb)
>> [0x7f059f9296fb]
>>  3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7f059f959d7e]
>>  4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x73c) [0x7f059f916a0c]
>>  5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5)
>> [0x7f059f7ced65]
>>  6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>
>> const&)+0x5d) [0x7f059f7cef8d]
>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x86c) [0x7f059f7f003c]
>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947)
>> [0x7f059fe5e007]
>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f059fe60160]
>>  10: (()+0x8184) [0x7f059e2d2184]
>>  11: (clone()+0x6d) [0x7f059c1e337d]
>>
>> And at the client side, I got segmentation fault.
>>
>> I am wondering what will be the possible reason that cause the assert fail?
>>
>> Thanks,
>>
>> Sugang
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux