Hi Yue, On Thu, 25 Sep 2014, yue longguang wrote: > ---------- Forwarded message ---------- > From: yue longguang <yuelongguang@xxxxxxxxx> > Date: Tue, Sep 23, 2014 at 5:53 PM > Subject: question about client's cluster aware > To: ceph-devel@xxxxxxxxxxxxxxx > > > hi,all > > my question is from my test. > let's take a example. object1(4MB)--> pg 0.1 --> osd 1,2,3,p1 > > when client is writing object1, during the write , osd1 is down. let > suppose 2MB is writed. > 1. > when the connection to osd1 is down, what does client do? ask > monitor for new osdmap? or only the pg map? For a client that is mostly idle and has only a single IO in progress to the failed machine, it will wait for N seconds before asking the monitor for an updated OSDMap. Usually, though, it will get that incremental map update/diff from another OSD in the cluster. Any time the client sends a request to any OSD, that OSD will share map incrementals/diffs if it has a newer map. So for a cluster with say 100 OSDs, say 99% of the time it will fine out about the failure from another OSD. > 2. > now client gets a newer map , continues the write , the primary osd > should be osd2. the rest 2MB is writed out. The client will resend any request that hasn't been acked to the new primary. If it was a single 2MB write, tha tmeans it will resend the whole write. If it was two 1MB writes, it will resend whichever portions haven't been acked (probably both, if the failure happened mid-write). > now what does ceph do to integrate the two part data? and to promise > that replicas is enough? If the new primary has that writ eon disk already (because it had completed the write before it crashed) it will reply immediately-- operations have unique IDs are are idempotent. If it hasn't seen the write yet, it will do it then. > 3. > where is the code. Be sure to tell me where the code is? osdc/Objecter.cc scan_requests() is where we decide what to resend (specifically look where we call recalc_target). You'll find hte dup request check either in OSD.cc handle_op or in ReplicatedPG.cc do_request. The map sharing code is in OSD.cc in _share_map_incoming (or something like that). Hope that helps! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html