Re: Is it ok to add a new brick with files already on it?

"SINCOCK John" <J.Sincock@xxxxxxxxx> · Thu, 16 Oct 2014 13:17:31 +1030

Well, 3TB in 13 hrs is about 80 hours to sync 20TB, ie 3-4 days, and it could be a lot longer with a large number of small files (a good chunk, but not all, of our data is composed of hundreds of thousands of small .jpg image files 100 Kbytes or so). Overall, there are millions of files that need to be transferred.

A good thing about rsync directly to the single server, is that if we do have to stop the rsync for any reason, then it will be very fast to restart later. Restarting an rsync part-way through a large transfer to a gluster can be incredibly slow, as it has to stat all the files that have made it onto the gluster, in order to work out where to restart. Just working out where to restart could take hours on glusterfs, whereas, rsync direct to xfs filesystem will tear through millions of stat operations and work out where to restart in a matter of minutes.

So for these reasons, it seems like we should be able to save an enormous amount of time rsyncing directly to the xfs bricks and adding the bricks to gluster later…

Basically, our setup has 2 (soon to be 3) reasonably powerful nodes setup like:
1)      Each node is a supermicro chassis with 12 x 4TB Hitachi disks, using LSI 9280-4i4e RAID controller, with large RAID6 array formatted with 4 XFS bricks of 9TB each, for a total of 36.5TB per node.
2)      10gbe connecting the nodes.
3)      Xeon E3-1245 quad-core (8 HT) CPU @ 3.4 GHz, 16GB RAM
These nodes definitely do not have the most powerful CPU ever, nor do they have huge quantities of RAM either, but the disk arrays should be capable of some good speed, and we hope they should be adequate for a gluster that is just a huge archive. We just want to move data onto it, and then access it when needed, or to backup data from it (to tape).

From: Ryan Nix [mailto:ryan.nix@xxxxxxxxx] 
Sent: Thursday, 16 October 2014 12:12 PM
To: SINCOCK John
Cc: Franco Broi; gluster-users
Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on it?

Interesting.  Still, I think its better to let the Gluster client handle the syncing.  What happens if, for some strange reason, the rsync process dies in the middle of the night?  Gluster, on the other, will keep working to get the data on the other bricks without human intervention.  I recently used Gluster to sync 3 TBs of data to the another brick over a 1Gbps link in about 13 hours on decent hardware.

On Wed, Oct 15, 2014 at 9:04 PM, SINCOCK John <J.Sincock@xxxxxxxxx> wrote:

We have 20 Terabytes to rsync onto a new server (which will have 32 TB capacity), 
And we then want to add that server to an existing 2-node gluster of 73TB (53 TB used, 20 TB free), to give a 3-node gluster with 105TB capacity, 73TB used.

The reason I want to do it this way, if possible, is that Gluster is slow on writes, especially for small files, and we have a LOT of small files, so I’m pretty sure it will be  LOT faster to rsync directly to the new server (which is the one that has free space anyway), and then add that server to the gluster – if it is possible to have gluster recognise those files.

From: Ryan Nix [mailto:ryan.nix@xxxxxxxxx] 
Sent: Thursday, 16 October 2014 11:58 AM
To: SINCOCK John
Cc: Franco Broi; gluster-users

Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on it?

So Gluster, at its core, uses rsync to copy the data to the other bricks.  Why not let Gluster do the heavy lifting?

On Wed, Oct 15, 2014 at 7:35 PM, SINCOCK John <J.Sincock@xxxxxxxxx> wrote:

In a related question... it seems, if it is possible to add filesystems already containing data, as new bricks, then it should also be possible to:

1) create empty bricks
2) add them to the gluster volume while they are empty
3) rsync data directly onto the underlying empty bricks, circumventing gluster, ie not through the gluster mountpoint
4) somehow get gluster to recognise the data that has been copied into the bricks?

How would you go about getting gluster to see the data you've rsynced directly in?
My concern would be that all the data rsynced directly onto the bricks will just sit there, invisible to glusterfs.

Thanks again for any info!

-----Original Message-----
From: Franco Broi [mailto:franco.broi@xxxxxxxxxx]
Sent: Thursday, 16 October 2014 10:06 AM
To: SINCOCK John
Cc: gluster-users@xxxxxxxxxxx
Subject: Re: [Gluster-users] Is it ok to add a new brick with files already on it?
I've never added a brick with existing files but I did start a new Gluster volume on disks that already contained data and I was able to access the files without problem. Of course the files will be out of place but the first time you access them, Gluster will add links to speed up future lookups.

On Thu, 2014-10-16 at 09:57 +1030, SINCOCK John wrote:
> Hi Everyone,
>
>
>
> All the instructions I’ve been able to find on adding a brick to a
> gluster, seem to assume the brick is empty when it’s added.
>
>
>
> So my question is, is it possible for a new brick, loaded up with
> files, to be added to a gluster (and for all the files already on that
> brick, to be indexed and added into the gluster). Apologies if the
> question is answered elsewhere, but I couldn’t find anyone addressing
> this specific question, and certainty helps when you’re dealing with
> 10’s of terabytes of data... ;-)
>
>
>
> Thanks in advance for any info or tips!
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users