as for the second question, could you tell me where the code is. how ceph makes size/min_szie copies? thanks At 2014-09-11 12:19:18, "Gregory Farnum" <greg at inktank.com> wrote: >On Wed, Sep 10, 2014 at 8:29 PM, yuelongguang <fastsync at 163.com> wrote: >> >> >> >> >> as for ack and ondisk, ceph has size and min_size to decide there are how >> many replications. >> if client receive ack or ondisk, which means there are at least min_size >> osds have done the ops? >> >> i am reading the cource code, could you help me with the two questions. >> >> 1. >> on osd, where is the code that reply ops separately according to ack or >> ondisk. >> i check the code, but i thought they always are replied together. > >It depends on what journaling mode you're in, but generally they're >triggered separately (unless it goes on disk first, in which case it >will skip the ack ? this is the mode it uses for non-btrfs >filesystems). The places where it actually replies are pretty clear >about doing one or the other, though... > >> >> 2. >> now i just know how client write ops to primary osd, inside osd cluster, >> how it promises min_size copy are reached. >> i mean when primary osd receives ops , how it spreads ops to others, and >> how it processes other's reply. > >That's not how it works. The primary for a PG will not go "active" >with it until it has at least min_size copies that it knows about. >Once the OSD is doing any processing of the PG, it requires all >participating members to respond before it sends any messages back to >the client. >-Greg >Software Engineer #42 @ http://inktank.com | http://ceph.com > >> >> >> greg, thanks very much >> >> >> >> >> >> ? 2014-09-11 01:36:39?"Gregory Farnum" <greg at inktank.com> ??? >> >> The important bit there is actually near the end of the message output line, >> where the first says "ack" and the second says "ondisk". >> >> I assume you're using btrfs; the ack is returned after the write is applied >> in-memory and readable by clients. The ondisk (commit) message is returned >> after it's durable to the journal or the backing filesystem. >> -Greg >> >> On Wednesday, September 10, 2014, yuelongguang <fastsync at 163.com> wrote: >>> >>> hi,all >>> i recently debug ceph rbd, the log tells that one write to osd can get >>> two if its reply. >>> the difference between them is seq. >>> why? >>> >>> thanks >>> ---log--------- >>> reader got message 6 0x7f58900010a0 osd_op_reply(15 >>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304 >>> write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6 >>> 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue >>> 0x7f58900010a0 prio 127 >>> 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader reading tag... >>> 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got MSG >>> 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0 >>> 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader wants 247 from dispatch throttler 247/104857600 >>> 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got front 247 >>> 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).aborted = 0 >>> 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got 247 + 0 + 0 byte message >>> 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq # >>> = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0 >>> 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 >>> c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15 >>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304 >>> write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6 >>> >>> >> >> >> -- >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140911/74159c7c/attachment.htm>