Parallel cp?

anand.avati at gmail.com (Anand Avati) · Sun, 5 Feb 2012 17:30:40 +0530

On Sun, Feb 5, 2012 at 4:41 AM, Brian Candler <B.Candler at pobox.com> wrote:
> I reckon that to quickly copy one glusterfs volume to another, I will need a
> multi-threaded 'cp'. ?That is, something which will take the list of files
> from readdir() and copy batches of N of them in parallel. ?This is so I can
> keep all the component spindles busy.
>
> Question 1: does such a thing existing already in the open source world?

Not aware of one. Please post to this thread if you find one.

> Question 2: for a DHT volume, does readdir() return the files in a
> round-robin fashion, i.e. one from brick 1, one from brick 2, one from brick
> 3 etc? Or does it return all the results from one brick, followed by all the
> results from the second brick, and so on? Or something indeterminate?

It returns all entries from the first brick, and only non-directories
from the second, so on.. (sequentially)

> Alternatively: is it possible to determine for each file which brick it
> resides on?

Yes, there is the virtual extended attribute
"trusted.glusterfs.pathinfo" which gives you the location (hostname)
of a file.

> (I don't think it's in an extended attribute; I tried 'getfattr -d' on a
> file, both on the GlusterFS mount and on the underlying brick, and couldn't
> see anything)
>
> Thanks,
>
> Brian.
>
> P.S. I did look in the source, and I couldn't figure out how dht_do_readdir
> works. ?But it does have a slightly disconcerting comment:
>
> /* TODO: do proper readdir */

That comment is only for corner cases when the backend filesystem is
inconsistent. It is not relevant to the algorithm you were enquiring.

Avati