why one osd-op from client can get two osd-op-reply?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



as for the second question, could you tell me where the code is.

how ceph makes size/min_szie copies?
 
thanks









At 2014-09-11 12:19:18, "Gregory Farnum" <greg at inktank.com> wrote:
>On Wed, Sep 10, 2014 at 8:29 PM, yuelongguang <fastsync at 163.com> wrote:
>>
>>
>>
>>
>> as for ack and ondisk, ceph has size and min_size to decide there are  how
>> many replications.
>> if client receive ack or ondisk, which means there are at least min_size
>> osds  have  done the ops?
>>
>> i am reading the cource code, could you help me with the two questions.
>>
>> 1.
>>  on osd, where is the code that reply ops  separately  according to ack or
>> ondisk.
>>  i check the code, but i thought they always are replied together.
>
>It depends on what journaling mode you're in, but generally they're
>triggered separately (unless it goes on disk first, in which case it
>will skip the ack ? this is the mode it uses for non-btrfs
>filesystems). The places where it actually replies are pretty clear
>about doing one or the other, though...
>
>>
>> 2.
>>  now i just know how client write ops to primary osd, inside osd cluster,
>> how it promises min_size copy are reached.
>> i mean  when primary osd receives ops , how it spreads ops to others, and
>> how it processes other's reply.
>
>That's not how it works. The primary for a PG will not go "active"
>with it until it has at least min_size copies that it knows about.
>Once the OSD is doing any processing of the PG, it requires all
>participating members to respond before it sends any messages back to
>the client.
>-Greg
>Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>>
>>
>> greg, thanks very much
>>
>>
>>
>>
>>
>> ? 2014-09-11 01:36:39?"Gregory Farnum" <greg at inktank.com> ???
>>
>> The important bit there is actually near the end of the message output line,
>> where the first says "ack" and the second says "ondisk".
>>
>> I assume you're using btrfs; the ack is returned after the write is applied
>> in-memory and readable by clients. The ondisk (commit) message is returned
>> after it's durable to the journal or the backing filesystem.
>> -Greg
>>
>> On Wednesday, September 10, 2014, yuelongguang <fastsync at 163.com> wrote:
>>>
>>> hi,all
>>> i recently debug ceph rbd, the log tells that  one write to osd can get
>>> two if its reply.
>>> the difference between them is seq.
>>> why?
>>>
>>> thanks
>>> ---log---------
>>> reader got message 6 0x7f58900010a0 osd_op_reply(15
>>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304
>>> write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6
>>> 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue
>>> 0x7f58900010a0 prio 127
>>> 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader reading tag...
>>> 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader got MSG
>>> 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0
>>> 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader wants 247 from dispatch throttler 247/104857600
>>> 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader got front 247
>>> 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).aborted = 0
>>> 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader got 247 + 0 + 0 byte message
>>> 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq #
>>> = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0
>>> 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >>
>>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1
>>> c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15
>>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304
>>> write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6
>>>
>>>
>>
>>
>> --
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140911/74159c7c/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux