Not if you want the PG log to have consistent ordering. -Sam On Fri, Jul 22, 2016 at 7:00 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: > Actually write lock the object only. Is that gonna work? > > Sugang > > On Thu, Jul 21, 2016 at 5:59 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >> Write lock on the whole pg? How do parallel clients work? >> -Sam >> >> On Thu, Jul 21, 2016 at 12:36 PM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: >>> The error above occurs when I am sending MOSOp to the replicas, and I >>> have to fix that first. >>> >>> For the consistency, we are still using the Primary OSD as a control >>> center. That is, the client always goes to Primary OSD to ask for a >>> write lock, then write the replica. >>> >>> Sugang >>> >>> On Thu, Jul 21, 2016 at 3:28 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>> Well, they are actually different types with different encodings and >>>> different contents. The client doesn't really have the information >>>> needed to build a MSG_OSD_REPOP. Your best bet will be to send an >>>> MOSDOp to the replicas and hack up a write path that makes that work. >>>> >>>> How do you plan to address the consistency problems? >>>> -Sam >>>> >>>> On Thu, Jul 21, 2016 at 11:11 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: >>>>> So, to start with, I think one naive way is to make the replica think >>>>> it receives an op from the primary OSD, which actually comes from the >>>>> client. And the branching point looks like started from >>>>> OSD::dispatch_op_fast, where handle_op or handle_replica_op is called >>>>> based on the type of the request. So my question is, at the client >>>>> side, is there a way that I could set the corresponding variables >>>>> referred by "op->get_req()->get_type()" to MSG_OSD_SUBOP or >>>>> MSG_OSD_REPOP? >>>>> >>>>> Sugang >>>>> >>>>> On Thu, Jul 21, 2016 at 12:03 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>> Parallel read will be a *lot* easier since read-from-replica already >>>>>> works. Write to replica, however, is tough. The write path uses a >>>>>> lot of structures which are only populated on the primary. You're >>>>>> going to have to hack up most of the write path to bypass the existing >>>>>> replication machinery. Beyond that, maintaining consistency will >>>>>> obviously be a challenge. >>>>>> -Sam >>>>>> >>>>>> On Thu, Jul 21, 2016 at 8:49 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>> My goal is to achieve parallel write/read from the client instead of >>>>>>> the primary OSD. >>>>>>> >>>>>>> Sugang >>>>>>> >>>>>>> On Thu, Jul 21, 2016 at 11:47 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>>> I may be misunderstanding your goal. What are you trying to achieve? >>>>>>>> -Sam >>>>>>>> >>>>>>>> On Thu, Jul 21, 2016 at 8:43 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>>>> Well, that assert is asserting that the object is in the pool that the >>>>>>>>> pg operating on it belongs to. Something very wrong must have >>>>>>>>> happened for it to be not true. Also, replicas have basically none of >>>>>>>>> the code required to handle a write, so I'm kind of surprised it got >>>>>>>>> that far. I suggest that you read the debug logging and read the OSD >>>>>>>>> op handling path. >>>>>>>>> -Sam >>>>>>>>> >>>>>>>>> On Thu, Jul 21, 2016 at 8:34 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>>>>> Yes, I understand that. I was introduced to Ceph only 1 month ago, but >>>>>>>>>> I have the basic idea of Ceph communication pattern now. I have not >>>>>>>>>> make any changes to OSD yet. So I was wondering what is purpose of >>>>>>>>>> this "assert(oid.pool == static_cast<int64_t>(info.pgid.pool()))", and >>>>>>>>>> to change the code in OSD, what are the main aspects I should pay >>>>>>>>>> attention to? >>>>>>>>>> Since this is only a research project, the implementation does not >>>>>>>>>> have to be very sophisticated. >>>>>>>>>> >>>>>>>>>> I know my question is kinda too broad, any hints or suggestions will >>>>>>>>>> be highly appreciated. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Sugang >>>>>>>>>> >>>>>>>>>> On Thu, Jul 21, 2016 at 11:22 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>>>>>> Oh, that's a much more complicated change. You are going to need to >>>>>>>>>>> make extensive changes to the OSD to make that work. >>>>>>>>>>> -Sam >>>>>>>>>>> >>>>>>>>>>> On Thu, Jul 21, 2016 at 8:21 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>>>>>>> Hi Sam, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the quick reply. The main modification I made is to call >>>>>>>>>>>> calc_target within librados::IoCtxImpl::aio_operate before op_submit, >>>>>>>>>>>> so that I can get all replicated OSDs' id, and send a write op to each >>>>>>>>>>>> of them. I can also attach the modified code if necessary. >>>>>>>>>>>> >>>>>>>>>>>> I just reproduced this error with the conf you provided, please see below: >>>>>>>>>>>> osd/ReplicatedPG.cc: In function 'int >>>>>>>>>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*, >>>>>>>>>>>> bool, bool, hobject_t*)' thread 7fd6aba59700 time 2016-07-21 >>>>>>>>>>>> 15:09:26.431436 >>>>>>>>>>>> osd/ReplicatedPG.cc: 9042: FAILED assert(oid.pool == >>>>>>>>>>>> static_cast<int64_t>(info.pgid.pool())) >>>>>>>>>>>> ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c) >>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>> const*)+0x8b) [0x7fd6c5733e8b] >>>>>>>>>>>> 2: (ReplicatedPG::find_object_context(hobject_t const&, >>>>>>>>>>>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1e54) >>>>>>>>>>>> [0x7fd6c51ef7c4] >>>>>>>>>>>> 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7fd6c521fe9e] >>>>>>>>>>>> 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, >>>>>>>>>>>> ThreadPool::TPHandle&)+0x73c) [0x7fd6c51dca3c] >>>>>>>>>>>> 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) >>>>>>>>>>>> [0x7fd6c5094d65] >>>>>>>>>>>> 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> >>>>>>>>>>>> const&)+0x5d) [0x7fd6c5094f8d] >>>>>>>>>>>> 7: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7fd6c50b603c] >>>>>>>>>>>> 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) >>>>>>>>>>>> [0x7fd6c5724117] >>>>>>>>>>>> 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fd6c5726270] >>>>>>>>>>>> 10: (()+0x8184) [0x7fd6c3b98184] >>>>>>>>>>>> 11: (clone()+0x6d) [0x7fd6c1aa937d] >>>>>>>>>>>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>>>>>>>>>>> needed to interpret this. >>>>>>>>>>>> 2016-07-21 15:09:26.454854 7fd6aba59700 -1 osd/ReplicatedPG.cc: In >>>>>>>>>>>> function 'int ReplicatedPG::find_object_context(const hobject_t&, >>>>>>>>>>>> ObjectContextRef*, bool, bool, hobject_t*)' thread 7fd6aba59700 time >>>>>>>>>>>> 2016-07-21 15:09:26.431436 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This error occurs three times since I wrote to three OSDs. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Sugang >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Jul 21, 2016 at 10:54 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>>>>>>>>>>> Hmm. Can you provide more information about the poison op? If you >>>>>>>>>>>>> can reproduce with >>>>>>>>>>>>> debug osd = 20 >>>>>>>>>>>>> debug filestore = 20 >>>>>>>>>>>>> debug ms = 1 >>>>>>>>>>>>> it should be easier to work out what is going on. >>>>>>>>>>>>> -Sam >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Jul 21, 2016 at 7:13 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am working on a research project which requires multiple write >>>>>>>>>>>>>> operations for the same object at the same time from the client. At >>>>>>>>>>>>>> the OSD side, I got this error: >>>>>>>>>>>>>> osd/ReplicatedPG.cc: In function 'int >>>>>>>>>>>>>> ReplicatedPG::find_object_context(const hobject_t&, ObjectContextRef*, >>>>>>>>>>>>>> bool, bool, hobject_t*)' thread 7f0586193700 time 2016-07-21 >>>>>>>>>>>>>> 14:02:04.218448 >>>>>>>>>>>>>> osd/ReplicatedPG.cc: 9041: FAILED assert(oid.pool == >>>>>>>>>>>>>> static_cast<int64_t>(info.pgid.pool())) >>>>>>>>>>>>>> ceph version 10.2.0-2562-g0793a28 (0793a2844baa38f6bcc5c1724a1ceb9f8f1bbd9c) >>>>>>>>>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>>>>>>>>>>> const*)+0x8b) [0x7f059fe6dd7b] >>>>>>>>>>>>>> 2: (ReplicatedPG::find_object_context(hobject_t const&, >>>>>>>>>>>>>> std::shared_ptr<ObjectContext>*, bool, bool, hobject_t*)+0x1dbb) >>>>>>>>>>>>>> [0x7f059f9296fb] >>>>>>>>>>>>>> 3: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x186e) [0x7f059f959d7e] >>>>>>>>>>>>>> 4: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, >>>>>>>>>>>>>> ThreadPool::TPHandle&)+0x73c) [0x7f059f916a0c] >>>>>>>>>>>>>> 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>>>>>>>>>>>>> std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f5) >>>>>>>>>>>>>> [0x7f059f7ced65] >>>>>>>>>>>>>> 6: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest> >>>>>>>>>>>>>> const&)+0x5d) [0x7f059f7cef8d] >>>>>>>>>>>>>> 7: (OSD::ShardedOpWQ::_process(unsigned int, >>>>>>>>>>>>>> ceph::heartbeat_handle_d*)+0x86c) [0x7f059f7f003c] >>>>>>>>>>>>>> 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x947) >>>>>>>>>>>>>> [0x7f059fe5e007] >>>>>>>>>>>>>> 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f059fe60160] >>>>>>>>>>>>>> 10: (()+0x8184) [0x7f059e2d2184] >>>>>>>>>>>>>> 11: (clone()+0x6d) [0x7f059c1e337d] >>>>>>>>>>>>>> >>>>>>>>>>>>>> And at the client side, I got segmentation fault. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am wondering what will be the possible reason that cause the assert fail? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sugang >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html