Re: librados write operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the information! I will try that.

On Tue, Jun 28, 2016 at 11:58 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> Using strace we can see that sendmsg is used for communications to the OSD.
>
> $ gdb -q --args bin/rados -p rbd put test /etc/hosts
> Reading symbols from bin/rados...done.
> (gdb) set follow-fork-mode child
> (gdb) catch syscall sendmsg
> Catchpoint 1 (syscall 'sendmsg' [46])
> (gdb) r
> Starting program: /home/brad/working/src/ceph/build/bin/rados -p rbd
> put test /etc/hosts
> Thread 11 "ms_pipe_write" hit Catchpoint 2 (call to syscall sendmsg),
> 0x00007fffe5aa12fd in sendmsg () at
> ../sysdeps/unix/syscall-template.S:84
> 84      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
> (gdb) bt
> #0  0x00007fffe5aa12fd in sendmsg () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007fffef5c8603 in Pipe::do_sendmsg (this=this@entry=0x1067000,
> msg=msg@entry=0x7ffff7fa1730, len=len@entry=9, more=more@entry=false)
> at /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:2227
> #2  0x00007fffef5db03c in Pipe::connect (this=this@entry=0x1067000) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951
> #3  0x00007fffef5dfa02 in Pipe::writer (this=0x1067000) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:1728
> #4  0x00007fffef5e26bd in Pipe::Writer::entry (this=<optimized out>)
> at /home/brad/working/src/ceph/src/msg/simple/Pipe.h:63
> #5  0x00007fffe5a985ca in start_thread (arg=0x7ffff7fa4700) at
> pthread_create.c:333
> #6  0x00007fffe20f6ead in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> So now we know we send via Pipe::connect -> Pipe::do_sendmsg -> sendmsg.
>
> (gdb) f 2
> #2  0x00007fffef5db03c in Pipe::connect (this=this@entry=0x1067000) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951
> 951       rc = do_sendmsg(&msg, msglen);
> (gdb) p ntohs(peer_addr.u.sin.sin_port)
> $5 = 6789
>
> 6789 is the MON so we need to find a connection to an OSD.
>
> (gdb) del
> Delete all breakpoints? (y or n) y
> (gdb) b /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951
> Breakpoint 3 at 0x7fffe614f9ff:
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951. (2 locations)
> (gdb) c
> Continuing.
>
> Thread 12 "ms_pipe_write" hit Breakpoint 1, Pipe::connect
> (this=this@entry=0x7fffbc000dd0) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951
> 951       rc = do_sendmsg(&msg, msglen);
> (gdb) p ntohs(peer_addr.u.sin.sin_port)
> $3 = 6789                <-------------------------------- 6789 is a
> MON so move on
> (gdb) c
> Continuing.
> Thread 16 "ms_pipe_write" hit Breakpoint 1, Pipe::connect
> (this=this@entry=0x1071750) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951
> 951       rc = do_sendmsg(&msg, msglen);
> (gdb) p ntohs(peer_addr.u.sin.sin_port)
> $4 = 6800                <-------------------------------- 6800 is an OSD
> (gdb) bt
> #0  Pipe::connect (this=this@entry=0x10719a0) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:951
> #1  0x00007fffef5dfa02 in Pipe::writer (this=0x10719a0) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.cc:1728
> #2  0x00007fffef5e26bd in Pipe::Writer::entry (this=<optimized out>)
> at /home/brad/working/src/ceph/src/msg/simple/Pipe.h:63
> #3  0x00007fffe5a985ca in start_thread (arg=0x7fffccff9700) at
> pthread_create.c:333
> #4  0x00007fffe20f6ead in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> Now in Pipe::Writer we make the following call.
>
> 1798          // grab outgoing message
> 1799          Message *m = _get_next_outgoing();
>
> Remember that, it'll come in handy later :) So we know how the write
> actually happens in the writer thread but not (yet) how the queue gets
> populated.
>
> Now, here's the entire stack where we do the actual put (took a lot of
> reading of the code to work out where to put the breakpoint).
>
> (gdb) bt
> #0  Pipe::_send (m=0x10761e0, this=0x1073da0) at
> /home/brad/working/src/ceph/src/msg/simple/Pipe.h:304
> #1  SimpleMessenger::submit_message (this=this@entry=0x105ff60,
> m=m@entry=0x10761e0, con=con@entry=0x1073120, dest_addr=...,
> dest_type=4, already_locked=already_locked@entry=false)
>     at /home/brad/working/src/ceph/src/msg/simple/SimpleMessenger.cc:449
> #2  0x00007fffef48eb51 in SimpleMessenger::_send_message
> (this=0x105ff60, m=0x10761e0, con=0x1073120) at
> /home/brad/working/src/ceph/src/msg/simple/SimpleMessenger.cc:140
> #3  0x00007fffeef13182 in Objecter::_send_op
> (this=this@entry=0x1061ee0, op=op@entry=0x10733b0,
> m=m@entry=0x10761e0) at
> /home/brad/working/src/ceph/src/osdc/Objecter.cc:3104
> #4  0x00007fffeef14340 in Objecter::_op_submit
> (this=this@entry=0x1061ee0, op=op@entry=0x10733b0, sul=...,
> ptid=ptid@entry=0x7fffffffb878) at
> /home/brad/working/src/ceph/src/osdc/Objecter.cc:2364
> #5  0x00007fffeef18694 in Objecter::_op_submit_with_budget
> (this=this@entry=0x1061ee0, op=op@entry=0x10733b0, sul=...,
> ptid=ptid@entry=0x7fffffffb878, ctx_budget=ctx_budget@entry=0x0)
>     at /home/brad/working/src/ceph/src/osdc/Objecter.cc:2197
> #6  0x00007fffeef1892e in Objecter::op_submit (this=0x1061ee0,
> op=op@entry=0x10733b0, ptid=0x7fffffffb878, ptid@entry=0x0,
> ctx_budget=ctx_budget@entry=0x0) at
> /home/brad/working/src/ceph/src/osdc/Objecter.cc:2164
> #7  0x00007fffeeeddee6 in librados::IoCtxImpl::operate
> (this=this@entry=0x1072a50, oid=..., o=o@entry=0x7fffffffbb30,
> pmtime=pmtime@entry=0x0, flags=flags@entry=0) at
> /home/brad/working/src/ceph/src/librados/IoCtxImpl.cc:693
> #8  0x00007fffeeee3214 in librados::IoCtxImpl::write_full
> (this=0x1072a50, oid=..., bl=...) at
> /home/brad/working/src/ceph/src/librados/IoCtxImpl.cc:636
> #9  0x00007fffeeea7e6e in librados::IoCtx::write_full
> (this=this@entry=0x7fffffffbd90, oid="test", bl=...) at
> /home/brad/working/src/ceph/src/librados/librados.cc:1200
> #10 0x0000000000417121 in do_put (use_striper=false,
> op_size=<optimized out>, infile=0x7fffffffe44a "/etc/hosts",
> objname=<optimized out>, striper=..., io_ctx=..., this=<optimized
> out>)
>     at /home/brad/working/src/ceph/src/tools/rados/rados.cc:441
> #11 rados_tool_common (nargs=std::vector of length 3, capacity 8 =
> {...}, opts=std::map with 1 elements = {...}) at
> /home/brad/working/src/ceph/src/tools/rados/rados.cc:2079
> #12 main (argc=<optimized out>, argv=<optimized out>) at
> /home/brad/working/src/ceph/src/tools/rados/rados.cc:3433
> (gdb) l 308
> 303         void _send(Message *m) {
> 304           assert(pipe_lock.is_locked());
> 305           out_q[m->get_priority()].push_back(m);
> 306           cond.Signal();
> 307         }
>
> So that's where the message finally gets added to the queue.
>
> Now remember _get_next_outgoing()?
>
> 313     Message *_get_next_outgoing() {
> 314       assert(pipe_lock.is_locked());
> 315       Message *m = 0;
> 316       while (!m && !out_q.empty()) {
> 317         map<int, list<Message*> >::reverse_iterator p = out_q.rbegin();
> 318         if (!p->second.empty()) {
> 319           m = p->second.front();
> 320           p->second.pop_front();
> 321         }
> 322         if (p->second.empty())
> 323           out_q.erase(p->first);
> 324       }
> 325       return m;
> 326     }
>
> It's busy reading the queue we just populated.
>
> HTH,
> Brad
>
> On Wed, Jun 29, 2016 at 7:06 AM, Sugang Li <sugangli@xxxxxxxxxxxxxxxxxx> wrote:
>> Hi everyone,
>>
>> I am looking into the ceph src code about how the write is
>> implemented. When I am using rados bench to perform write operation,
>> I can see that "void write(uint64_t off, bufferlist& bl, uint64_t
>> truncate_size, uint32_t truncate_seq)" in Objecter.h is called for
>> every write, and this operation is put into a queue named "ops". My
>> understand is that, another process/thread will dequeue this
>> operation, and look up the CRUSH maps to find a primary OSD and push
>> it the data (or ask it to pull the data). If this is the case, what
>> exactly function is doing this? If this is not the case, then what
>> will be the following operation?
>>
>> Thanks,
>>
>> Sugang
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Cheers,
> Brad
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux