It's the recovery and backfill code. There's not one place; it's what most of the OSD code is for. On Thursday, September 11, 2014, yuelongguang <fastsync at 163.com> wrote: > as for the second question, could you tell me where the code is. > how ceph makes size/min_szie copies? > > thanks > > > > > > > > At 2014-09-11 12:19:18, "Gregory Farnum" <greg at inktank.com <javascript:_e(%7B%7D,'cvml','greg at inktank.com');>> wrote: > >On Wed, Sep 10, 2014 at 8:29 PM, yuelongguang <fastsync at 163.com <javascript:_e(%7B%7D,'cvml','fastsync at 163.com');>> wrote: > >> > >> > >> > >> > >> as for ack and ondisk, ceph has size and min_size to decide there are how > >> many replications. > >> if client receive ack or ondisk, which means there are at least min_size > >> osds have done the ops? > >> > >> i am reading the cource code, could you help me with the two questions. > >> > >> 1. > >> on osd, where is the code that reply ops separately according to ack or > >> ondisk. > >> i check the code, but i thought they always are replied together. > > > >It depends on what journaling mode you're in, but generally they're > >triggered separately (unless it goes on disk first, in which case it > >will skip the ack ? this is the mode it uses for non-btrfs > >filesystems). The places where it actually replies are pretty clear > >about doing one or the other, though... > > > >> > >> 2. > >> now i just know how client write ops to primary osd, inside osd cluster, > >> how it promises min_size copy are reached. > >> i mean when primary osd receives ops , how it spreads ops to others, and > >> how it processes other's reply. > > > >That's not how it works. The primary for a PG will not go "active" > >with it until it has at least min_size copies that it knows about. > >Once the OSD is doing any processing of the PG, it requires all > >participating members to respond before it sends any messages back to > >the client. > >-Greg > >Software Engineer #42 @ http://inktank.com | http://ceph.com > > > >> > >> > >> greg, thanks very much > >> > >> > >> > >> > >> > >> ? 2014-09-11 01:36:39?"Gregory Farnum" <greg at inktank.com <javascript:_e(%7B%7D,'cvml','greg at inktank.com');>> ??? > >> > >> The important bit there is actually near the end of the message output line, > >> where the first says "ack" and the second says "ondisk". > >> > >> I assume you're using btrfs; the ack is returned after the write is applied > >> in-memory and readable by clients. The ondisk (commit) message is returned > >> after it's durable to the journal or the backing filesystem. > >> -Greg > >> > >> On Wednesday, September 10, 2014, yuelongguang <fastsync at 163.com <javascript:_e(%7B%7D,'cvml','fastsync at 163.com');>> wrote: > >>> > >>> hi,all > >>> i recently debug ceph rbd, the log tells that one write to osd can get > >>> two if its reply. > >>> the difference between them is seq. > >>> why? > >>> > >>> thanks > >>> ---log--------- > >>> reader got message 6 0x7f58900010a0 osd_op_reply(15 > >>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304 > >>> write_size 4194304,write 0~3145728] v211'518 uv518 ack = 0) v6 > >>> 2014-09-10 08:47:32.348213 7f58bc16b700 20 -- 10.58.100.92:0/1047669 queue > >>> 0x7f58900010a0 prio 127 > >>> 2014-09-10 08:47:32.348230 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader reading tag... > >>> 2014-09-10 08:47:32.348245 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader got MSG > >>> 2014-09-10 08:47:32.348257 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader got envelope type=43 src osd.1 front=247 data=0 off 0 > >>> 2014-09-10 08:47:32.348269 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader wants 247 from dispatch throttler 247/104857600 > >>> 2014-09-10 08:47:32.348286 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader got front 247 > >>> 2014-09-10 08:47:32.348303 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).aborted = 0 > >>> 2014-09-10 08:47:32.348312 7f58bc16b700 20 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader got 247 + 0 + 0 byte message > >>> 2014-09-10 08:47:32.348332 7f58bc16b700 10 check_message_signature: seq # > >>> = 7 front_crc_ = 3699418201 middle_crc = 0 data_crc = 0 > >>> 2014-09-10 08:47:32.348369 7f58bc16b700 10 -- 10.58.100.92:0/1047669 >> > >>> 10.154.249.4:6800/2473 pipe(0xfae6d0 sd=6 :64407 s=2 pgs=133 cs=1 l=1 > >>> c=0xfae940).reader got message 7 0x7f5890003660 osd_op_reply(15 > >>> rbd_data.19d92ae8944a.0000000000000001 [set-alloc-hint object_size 4194304 > >>> write_size 4194304,write 0~3145728] v211'518 uv518 ondisk = 0) v6 > >>> > >>> > >> > >> > >> -- > >> Software Engineer #42 @ http://inktank.com | http://ceph.com > >> > >> > >> > > > > -- Software Engineer #42 @ http://inktank.com | http://ceph.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140911/3aa6100b/attachment.htm>