On Wed, Aug 28, 2013 at 1:22 PM, daniel pol <daniel_pol@xxxxxxxxxxx> wrote: > Sorry, my bad. Only my second post and forgot the "reply all" > > Thanks for the info. I'm looking at the impact of pg number on performance. > Just trying to learn more about how Ceph works. > I didn't set pgp_num. It came by default with 2 in my case. Did you start the pool with 2 PGs? If not, that's...odd. You can update it with "ceph osd pool set" (see http://ceph.com/docs/master/rados/operations/control/). -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > Have a nice day, > Dani > >> Date: Wed, 28 Aug 2013 13:04:19 -0700 > >> Subject: Re: Reading from replica >> From: greg@xxxxxxxxxxx >> To: daniel_pol@xxxxxxxxxxx; ceph-users@xxxxxxxxxxxxxx > >> >> [ Please keep list discussions on the list. :) ] >> >> On Wed, Aug 28, 2013 at 12:54 PM, daniel pol <daniel_pol@xxxxxxxxxxx> >> wrote: >> > Hi ! >> > >> > Any pointers to where I can find the contortions ? >> >> You don't really want to — read-from-replica isn't safe except in very >> specific circumstances. >> >> > I agree with you on >> > should be seeing reads from both OSDs. I'm new to Ceph, so I might have >> > done >> > something wrong. I created a pool with size=2, and pg_num 16. I used >> > rados >> > bench for testing with default values. Here's the info about that pool: >> > osd dump: >> > pool 8 'test5' rep size 2 min_size 1 crush_ruleset 0 object_hash >> > rjenkins >> > pg_num 16 pgp_num 2 last_change 28 owner 0 >> >> That "pgp_num 2" is your problem — you are placing all the data as if >> there are only two shards, and since it's pseudorandom both shards >> happened to end up with one OSD as primary. You should set that to the >> same as your pg_num in most cases, and should probably have a lot more >> than 16. :) The rule of thumb is that if you have only one pool in use >> it should have roughly 100*[num OSDs]/[pool replication size] PGs. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> > >> > pg dump: >> > 8.4 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.231407 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.964532 0'0 >> > 2013-08-28 13:16:22.964532 >> > 8.3 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.231596 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.964239 0'0 >> > 2013-08-28 13:16:22.964239 >> > 8.2 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.231922 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.963956 0'0 >> > 2013-08-28 13:16:22.963956 >> > 8.1 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.232564 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.963659 0'0 >> > 2013-08-28 13:16:22.963659 >> > 8.0 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.232604 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.963153 0'0 >> > 2013-08-28 13:16:22.963153 >> > 8.f 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.233342 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.967841 0'0 >> > 2013-08-28 13:16:22.967841 >> > 8.e 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.231966 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.967529 0'0 >> > 2013-08-28 13:16:22.967529 >> > 8.d 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.232289 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.967249 0'0 >> > 2013-08-28 13:16:22.967249 >> > 8.c 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.232694 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.966945 0'0 >> > 2013-08-28 13:16:22.966945 >> > 8.b 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.233098 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.966641 0'0 >> > 2013-08-28 13:16:22.966641 >> > 8.a 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.235592 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.966362 0'0 >> > 2013-08-28 13:16:22.966362 >> > 8.9 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.235616 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.966052 0'0 >> > 2013-08-28 13:16:22.966052 >> > 8.8 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.235950 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.965760 0'0 >> > 2013-08-28 13:16:22.965760 >> > 8.7 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.231703 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.965458 0'0 >> > 2013-08-28 13:16:22.965458 >> > 8.6 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.230886 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.965128 0'0 >> > 2013-08-28 13:16:22.965128 >> > 8.5 0 0 0 0 0 0 0 active+clean 2013-08-28 13:16:48.231136 0'0 >> > 29:15 [1,0] [1,0] 0'0 2013-08-28 13:16:22.964817 0'0 >> > 2013-08-28 13:16:22.964817 >> > >> > crush map: >> > # begin crush map >> > # devices >> > device 0 osd.0 >> > device 1 osd.1 >> > # types >> > type 0 osd >> > type 1 host >> > type 2 rack >> > type 3 row >> > type 4 room >> > type 5 datacenter >> > type 6 root >> > # buckets >> > host DFS1 { >> > id -2 # do not change unnecessarily >> > # weight 0.800 >> > alg straw >> > hash 0 # rjenkins1 >> > item osd.0 weight 0.400 >> > item osd.1 weight 0.400 >> > } >> > root default { >> > id -1 # do not change unnecessarily >> > # weight 0.800 >> > alg straw >> > hash 0 # rjenkins1 >> > item DFS1 weight 0.800 >> > } >> > # rules >> > rule data { >> > ruleset 0 >> > type replicated >> > min_size 1 >> > max_size 10 >> > step take default >> > step choose firstn 0 type osd >> > step emit >> > } >> > rule metadata { >> > ruleset 1 >> > type replicated >> > min_size 1 >> > max_size 10 >> > step take default >> > step choose firstn 0 type osd >> > step emit >> > } >> > rule rbd { >> > ruleset 2 >> > type replicated >> > min_size 1 >> > max_size 10 >> > step take default >> > step choose firstn 0 type osd >> > step emit >> > } >> > # end crush map >> > >> > >> > Have a nice day, >> > Dani >> > >> > ________________________________ >> > Date: Wed, 28 Aug 2013 12:09:41 -0700 >> > Subject: Re: Reading from replica >> > From: greg@xxxxxxxxxxx >> > To: daniel_pol@xxxxxxxxxxx >> > CC: ceph-users@xxxxxxxxxxxxxx >> > >> > >> > Read-from-replica does not happen unless you go through some contortions >> > with config and developer setups. However, all n OSDs should be the >> > primary >> > for about 1/n% of the data, so you should be seeing reads to to both >> > OSDs as >> > long as you touch several objects at a time. >> > -Greg >> > >> > On Wednesday, August 28, 2013, daniel pol wrote: >> > >> > Hi ! >> > >> > I need a little help understanding reads from replicas. I've read a few >> > conflicting messages and the documentation is not very clear to me on >> > this >> > subject (maybe I didn't find the proper doc). >> > Here's the question: With default replication of 2 (size=2), when doing >> > reads (big sequential reads in my case) are we expecting to see read >> > going >> > to the "primary" object AND it's replicas ? (similar to RAID1 where you >> > read from both sides of the mirror) >> > >> > I'm not seeing that on my setup right now. I have a pool with 16 PGs on >> > 2 >> > OSDs. When I do reads only one 1 OSD gets IO. >> > If that's normal (replica involved in IO only when primary is down) I'll >> > take a note, otherwise I'll have to find why I don't get reads from >> > replicas. >> > >> > Have a nice day, >> > Dani >> > >> > >> > >> > -- >> > Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com