Re: Question regarding CRUSH algorithm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks David,

Its not quiet what i was looking for. Let me explain my question in more detail -

This is excerpt from Crush paper, this explains how crush algo running on each client/osd maps pg to an osd during the write operation[lets assume].

"Tree buckets are structured as a weighted binary search tree with items at the leaves. Each interior node knows the total weight of its left and right subtrees and is labeled according to a fixed strategy (described below). In order to select an item within a bucket, CRUSH starts at the root of the tree and calculates the hash of the input key x, replica number r, the bucket identifier, and the label at the current tree node (initially the root). The result is compared to the weight ratio of the left and right subtrees to decide which child node to visit next. This process is repeated until a leaf node is reached, at which point the associated item in the bucket is chosen. Only logn hashes and node comparisons are needed to locate an item.:"

 My question is along the way the tree structure changes, weights of the nodes change and some nodes even go away. In that case, how are future reads lead to pg to same osd mapping? Its not cached anywhere, same algo runs for every future read - what i am missing is how it picks the same osd[where data resides] every time. With a modified crush map, won't we end up with different leaf node if we apply same algo? 

Thanks
Girish

On Thu, Feb 16, 2017 at 12:05 PM, David Turner <david.turner@xxxxxxxxxxxxxxxx> wrote:

As a piece to the puzzle, the client always has an up to date osd map (which includes the crush map).  If it's out of date, then it has to get a new one before it can request to read or write to the cluster.  That way the client will never have old information and if you add or remove storage, the client will always have the most up to date map to know where the current copies of the files are.

This can cause slow downs in your cluster performance if you are updating your osdmap frequently, which can be caused by deleting a lot of snapshots as an example.


David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2760 | Mobile: 385.224.2943


If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.



From: ceph-users [ceph-users-bounces@lists.ceph.com] on behalf of girish kenkere [kngenius@xxxxxxxxx]
Sent: Thursday, February 16, 2017 12:43 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Question regarding CRUSH algorithm

Hi, I have a question regarding CRUSH algorithm - please let me know how this works. CRUSH paper talks about how given an object we select OSD via two mapping - first one is obj to PG and then PG to OSD. 

This PG to OSD mapping is something i dont understand. It uses pg#, cluster map, and placement rules. How is it guaranteed to return correct OSD for future reads after the cluster map/placement rules has changed due to nodes coming and out?

Thanks
Girish


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux