why one osd-op from client can get two osd-op-reply?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 10, 2014 at 8:29 PM, yuelongguang <fastsync at 163.com> wrote:
>
>
>
>
> as for ack and ondisk, ceph has size and min_size to decide there are  how
> many replications.
> if client receive ack or ondisk, which means there are at least min_size
> osds  have  done the ops?
>
> i am reading the cource code, could you help me with the two questions.
>
> 1.
>  on osd, where is the code that reply ops  separately  according to ack or
> ondisk.
>  i check the code, but i thought they always are replied together.

It depends on what journaling mode you're in, but generally they're
triggered separately (unless it goes on disk first, in which case it
will skip the ack ? this is the mode it uses for non-btrfs
filesystems). The places where it actually replies are pretty clear
about doing one or the other, though...

>
> 2.
>  now i just know how client write ops to primary osd, inside osd cluster,
> how it promises min_size copy are reached.
> i mean  when primary osd receives ops , how it spreads ops to others, and
> how it processes other's reply.

That's not how it works. The primary for a PG will not go "active"
with it until it has at least min_size copies that it knows about.
Once the OSD is doing any processing of the PG, it requires all
participating members to respond before it sends any messages back to
the client.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

>
>
> greg, thanks very much
>
>
>
>
>
> ? 2014-09-11 01:36:39?"Gregory Farnum" <greg at inktank.com> ???
>
> The important bit there is actually near the end of the message output line,
> where the first says "ack" and the second says "ondisk".
>
> I assume you're using btrfs; the ack is returned after the write is applied
> in-memory and readable by clients. The ondisk (commit) message is returned
> after it's durable to the journal or the backing filesystem.
> -Greg
>
> On Wednesday, September 10, 2014, yuelongguang <fastsync at 163.com> wrote:
>>
>> hi,all
>> i recently debug ceph rbd, the log tells that  one write to osd can get
>> two if its reply.
>> the difference between them is seq.
>> why?
>>
>> thanks
>> ---log---------
>> reader got message 6 0x7f58900010a0 osd_op_reply(15
>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304
>> write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6
>> 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue
>> 0x7f58900010a0 prio 127
>> 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader reading tag...
>> 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader got MSG
>> 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0
>> 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader wants 247 from dispatch throttler 247/104857600
>> 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader got front 247
>> 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).aborted = 0
>> 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader got 247 + 0 + 0 byte message
>> 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq #
>> = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0
>> 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >>
>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>> c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15
>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304
>> write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6
>>
>>
>
>
> --
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux