On Sun, Feb 5, 2012 at 4:41 AM, Brian Candler <B.Candler at pobox.com> wrote: > I reckon that to quickly copy one glusterfs volume to another, I will need a > multi-threaded 'cp'. ?That is, something which will take the list of files > from readdir() and copy batches of N of them in parallel. ?This is so I can > keep all the component spindles busy. > > Question 1: does such a thing existing already in the open source world? Not aware of one. Please post to this thread if you find one. > Question 2: for a DHT volume, does readdir() return the files in a > round-robin fashion, i.e. one from brick 1, one from brick 2, one from brick > 3 etc? Or does it return all the results from one brick, followed by all the > results from the second brick, and so on? Or something indeterminate? It returns all entries from the first brick, and only non-directories from the second, so on.. (sequentially) > Alternatively: is it possible to determine for each file which brick it > resides on? Yes, there is the virtual extended attribute "trusted.glusterfs.pathinfo" which gives you the location (hostname) of a file. > (I don't think it's in an extended attribute; I tried 'getfattr -d' on a > file, both on the GlusterFS mount and on the underlying brick, and couldn't > see anything) > > Thanks, > > Brian. > > P.S. I did look in the source, and I couldn't figure out how dht_do_readdir > works. ?But it does have a slightly disconcerting comment: > > /* TODO: do proper readdir */ That comment is only for corner cases when the backend filesystem is inconsistent. It is not relevant to the algorithm you were enquiring. Avati