(redirecting to gluster-devel as a more appropriate forum) On 04/03/2013 05:03 PM, Jay Vyas wrote: > suppose I was going to serve a pedabyte of data sharded over 10 files > (1,2,3,...,10) over glusterfs, in 3 servers (call them Server1, Server2, > and Server3). > > The 3 servers would need access to the files such that : > > Server 1 will usually only access file 1 > Server 2 will usually only access file2. > Server 3 will access all ten files (the whole data set). > > Is there a way to get gluster to rebalance bricks over time based on > access patterns ... or otherwise .. what is the best way to increase the > average locality of access to files in the cluster ? The flippant answer would be to move the computation to the data instead of vice versa, like Hadoop is designed to do. ;) The less flippant answer is going to get a bit more complicated. There are three ways that you can control placement of a file, but none are really supported and all could get you in trouble. The first method is to create the file (or a copy) with a special name of the form file at dht:subvol, where the parts have the following meanings: * file = the file name you really want * dht = the name of the DHT translator in your client-side volfile * subvol = the name (from the same volfile) of the DHT subvolume where you want the file to go This is reasonably safe, because it's part of how rebalance works. To get even fancier than that, you need to know something about how the DHT translator uses "layouts" on directories to place files. There's a description here. http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/ The problem is that the user has very little control over how these layouts are generated. One thing you can do that's fairly easy is swap the layout xattrs on two bricks, which (after a rebalance) will swap what files they contain. For example, if your file is on brick2 and you want it to be on brick1, you swap the xattr values for that directory within brick1 and brick2. The ultimate level of control is to calculate your own layouts. For this to be useful in a scenario like yours, you'd need to copy or reverse-engineer the code in the DHT translator that calculates the hash for a file. Knowing that, you could do something like this: * assign a range for brick1 that contains the hash for file1 * assign a range for brick2 that contains the hash for file2 * assign the remaining range to brick3 I'm working on some mechanisms, and accompanying management/interface models, to provide this sort of control in a less hacker-ish form. Unfortunately, I'm tied down with about ten higher priorities, so I don't have any idea when that will be ready. In the meantime, please try these techniques *only with test data*, and caveat emptor.