Re: Crush - nuts and bolts

Shinobu Kinjo <skinjo@xxxxxxxxxx> · Fri, 30 Dec 2016 09:51:25 +0900

First off either on write / on read, client access to primary OSD
directly first. More specifically clients access to the MON 1st to get
cluster map, then primary OSD.

So it doesn't matter to think of difference between write/read. Only
difference between those 2 operations basically is that you must wait
for ack from replica OSD(s) to primary OSD before you get ack from
primary OSD.

How objects locations are calculated is:

 #1 Hash the object name
 #2 Calculate the has modulo the number of PGs
 #3 Get the pool id
 #4 Prepends the pool id to result of #2 to get the PG

To find object location, what you need to do is:

 # ceph osd map <pool name> <object name>
  e.g.,
   ceph osd map rbd HOSTS
   osdmap e11 pool 'rbd' (0) object 'HOSTS' -> pg 0.bc5444d9 (0.1) ->
up ([2,0,1], p2) acting ([2,0,1], p2)

Actual location of object on OSD(s) is:
 e.g.,
 ls /var/lib/ceph/osd/ceph-2/current/0.1_head/
 __head_00000001__0  HOSTS__head_BC5444D9__0

On Fri, Dec 30, 2016 at 8:55 AM, Ukko <ukkohakkarainen@xxxxxxxxx> wrote:
> Hi Shinobe,
>
> The documentation did not help me. I could not find the info on how the
> location for the object to be written gets selected, nor on the client side,
> how is the object's to be read location calculated.
>
> So in an environment of 10 storage nodes, 10 OSDs in each, size 3, 2 pools,
> 10 PGs each. How objectA (10 kB), objectB (10 MB), and objectC (10 GB) get
> located on write? How are the located by client on read? :)
>
> On Thu, Dec 29, 2016 at 2:01 PM, Ukko Hakkarainen
> <ukkohakkarainen@xxxxxxxxx> wrote:
>>
>> Shinobe,
>>
>> I'll re-check if the info I'm after is there, I recall not. I'll get back
>> to you later.
>>
>> Thanks!
>>
>> > Shinobu Kinjo <skinjo@xxxxxxxxxx> kirjoitti 29.12.2016 kello 5.28:
>> >
>> > Please see the following:
>> >
>> > http://docs.ceph.com/docs/giant/architecture/
>> >
>> > Everything you would want to know about is there.
>> >
>> > Regards,
>> >
>> >> On Thu, Dec 29, 2016 at 8:27 AM, Ukko <ukkohakkarainen@xxxxxxxxx>
>> >> wrote:
>> >> I'd be interested in CRUSH algorithm simplified in series of
>> >> pictures. How does a storage node write and client  read, and
>> >> how do they calculate what they're after? What gets where/Where is it
>> >> found/Why?
>> >>
>> >> I suggest over-simplified storage system of e.g. 1 monitor, 4 storage
>> >> nodes,
>> >> 2 OSDs/node,
>> >> 3 PGs/OSD, 2 pools?
>> >>
>> >> http://ceph.com/papers/weil-crush-sc06.pdf did not solve this for me.
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@xxxxxxxxxxxxxx
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com