Re: Question regarding CRUSH algorithm

girish kenkere <kngenius@xxxxxxxxx> · Thu, 16 Feb 2017 12:44:55 -0800

Thanks David,
Its not quiet what i was looking for. Let me explain my question in more detail -

This is excerpt from Crush paper, this explains how crush algo running on each client/osd maps pg to an osd during the write operation[lets assume].

"Tree buckets are structured as a weighted binary search
tree with items at the leaves. Each interior node knows the
total weight of its left and right subtrees and is labeled according
to a fixed strategy (described below). In order to
select an item within a bucket, CRUSH starts at the root of
the tree and calculates the hash of the input key x, replica
number r, the bucket identifier, and the label at the current
tree node (initially the root). The result is compared to the
weight ratio of the left and right subtrees to decide which
child node to visit next. This process is repeated until a leaf
node is reached, at which point the associated item in the
bucket is chosen. Only logn hashes and node comparisons
are needed to locate an item.:"

 My question is along the way the tree structure changes, weights of the nodes change and some nodes even go away. In that case, how are future reads lead to pg to same osd mapping? Its not cached anywhere, same algo runs for every future read - what i am missing is how it picks the same osd[where data resides] every time. With a modified crush map, won't we end up with different leaf node if we apply same algo? 

Thanks
Girish

On Thu, Feb 16, 2017 at 12:05 PM, David Turner <david.turner@xxxxxxxxxxxxxxxx> wrote:

As a piece to the puzzle, the client always has an up to date osd map (which includes the crush map).  If it's out of date, then it has to get a new one before it can request to
 read or write to the cluster.  That way the client will never have old information and if you add or remove storage, the client will always have the most up to date map to know where the current copies of the files are.

This can cause slow downs in your cluster performance if you are updating your osdmap frequently, which can be caused by deleting a lot of snapshots as an example.

David Turner |
Cloud Operations Engineer |
StorageCraft
 Technology Corporation

380 Data Drive Suite 300 |
Draper |
Utah |
84020

Office:
801.871.2760 |
Mobile:
385.224.2943

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this
 message is prohibited.

From: ceph-users [ceph-users-bounces@lists.ceph.com] on behalf of girish kenkere [kngenius@xxxxxxxxx]

Sent: Thursday, February 16, 2017 12:43 PM

To: ceph-users@xxxxxxxxxxxxxx

Subject:  Question regarding CRUSH algorithm

Hi, I have a question regarding CRUSH algorithm - please let me know how this works. CRUSH paper talks about
 how given an object we select OSD via two mapping - first one is obj to PG and then PG to OSD. 

This PG to OSD mapping is something i dont understand. It uses pg#, cluster map, and placement rules. How is it guaranteed
 to return correct OSD for future reads after the cluster map/placement rules has changed due to nodes coming and out?

Thanks
Girish

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com