Re: replicatedPG assert fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, I understand that. I was introduced to Ceph only 1 month ago, but
I have the basic idea of Ceph communication pattern now. I have not
make any changes to OSD yet. So I was wondering what is purpose of
this "assert(oid.pool == static_cast<int64_t>(info.pgid.pool()))", and
to change the code in OSD, what are the main aspects I should pay
attention to?
Since this is only a research project, the implementation does not
have to be very sophisticated.

I know my question is kinda too broad, any hints or suggestions will
be highly appreciated.

Thanks,

Sugang

On Thu, Jul 21, 2016 at 11:22 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> Oh, that's a much more complicated change.  You are going to need to
> make extensive changes to the OSD to make that work.
> -Sam
>
> On Thu, Jul 21, 2016 at 8:21 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote:
>> Hi Sam,
>>
>> Thanks for the quick reply. The main modification I made is to call
>> calc_target within librados::IoCtxImpl::aio_operate before op_submit,
>> so that I can get all replicated OSDs' id, and send a write op to each
>> of them. I can also attach the modified code if necessary.
>>
>> I just reproduced this error with the conf you provided,  please see below:
>> osd/ReplicatedPG.cc: In function 'int
>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*,
>> bool, bool, hobject_t*)' thread 7fd6aba59700 time 2016-07-21
>> 15:09:26.431436
>> osd/ReplicatedPG.cc: 9042: FAILED assert(oid.pool ==
>> static_cast<int64_t>(info.pgid.pool()))
>>  ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x8b) [0x7fd6c5733e8b]
>>  2: (ReplicatedPG::find_object_context(hobject_t const&,
>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1e54)
>> [0x7fd6c51ef7c4]
>>  3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7fd6c521fe9e]
>>  4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>> ThreadPool::TPHandle&)+0x73c) [0x7fd6c51dca3c]
>>  5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5)
>> [0x7fd6c5094d65]
>>  6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>
>> const&)+0x5d) [0x7fd6c5094f8d]
>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>> ceph::heartbeat_handle_d*)+0x86c) [0x7fd6c50b603c]
>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947)
>> [0x7fd6c5724117]
>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fd6c5726270]
>>  10: (()+0x8184) [0x7fd6c3b98184]
>>  11: (clone()+0x6d) [0x7fd6c1aa937d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>> 2016-07-21 15:09:26.454854 7fd6aba59700 -1 osd/ReplicatedPG.cc: In
>> function 'int ReplicatedPG::find_object_context(const hobject_t&,
>> ObjectContextRef*, bool, bool, hobject_t*)' thread 7fd6aba59700 time
>> 2016-07-21 15:09:26.431436
>>
>>
>> This error occurs three times since I wrote to three OSDs.
>>
>> Thanks,
>>
>> Sugang
>>
>> On Thu, Jul 21, 2016 at 10:54 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>>> Hmm.  Can you provide more information about the poison op?  If you
>>> can reproduce with
>>> debug osd = 20
>>> debug filestore = 20
>>> debug ms = 1
>>> it should be easier to work out what is going on.
>>> -Sam
>>>
>>> On Thu, Jul 21, 2016 at 7:13 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote:
>>>> Hi all,
>>>>
>>>> I am working on a research project which requires multiple write
>>>> operations for the same object at the same time from the client. At
>>>> the OSD side, I got this error:
>>>> osd/ReplicatedPG.cc: In function 'int
>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*,
>>>> bool, bool, hobject_t*)' thread 7f0586193700 time 2016-07-21
>>>> 14:02:04.218448
>>>> osd/ReplicatedPG.cc: 9041: FAILED assert(oid.pool ==
>>>> static_cast<int64_t>(info.pgid.pool()))
>>>>  ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c)
>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>> const*)+0x8b) [0x7f059fe6dd7b]
>>>>  2: (ReplicatedPG::find_object_context(hobject_t const&,
>>>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1dbb)
>>>> [0x7f059f9296fb]
>>>>  3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7f059f959d7e]
>>>>  4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&,
>>>> ThreadPool::TPHandle&)+0x73c) [0x7f059f916a0c]
>>>>  5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5)
>>>> [0x7f059f7ced65]
>>>>  6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>
>>>> const&)+0x5d) [0x7f059f7cef8d]
>>>>  7: (OSD::ShardedOpWQ::_process(unsigned int,
>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7f059f7f003c]
>>>>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947)
>>>> [0x7f059fe5e007]
>>>>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f059fe60160]
>>>>  10: (()+0x8184) [0x7f059e2d2184]
>>>>  11: (clone()+0x6d) [0x7f059c1e337d]
>>>>
>>>> And at the client side, I got segmentation fault.
>>>>
>>>> I am wondering what will be the possible reason that cause the assert fail?
>>>>
>>>> Thanks,
>>>>
>>>> Sugang
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux