Re: Is it ok to add a new brick with files already on it?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Aaaah :-) 
Ok, that's it then, I think I am a happy chappy.

Thanks very much Franco :-)


-----Original Message-----
From: Franco Broi [mailto:franco.broi@xxxxxxxxxx] 
Sent: Thursday, 16 October 2014 2:10 PM
To: SINCOCK John
Cc: gluster-users
Subject: Re:  Is it ok to add a new brick with files already on it?



Probably didn't make myself very clear.

There is nothing you need to do except have the files sitting on a brick for Gluster to make the files visible to clients. You do not have to worry about xattr's, Gluster makes them when you first access a file.


On Thu, 2014-10-16 at 11:56 +0800, Franco Broi wrote: 
> On Thu, 2014-10-16 at 13:33 +1030, SINCOCK John wrote: 
> > Ah, apologies, sorry yes gluster can be good & fast on large file writes, it is the large number of small files that slows gluster down, and I know this is pretty much unavoidable. 
> 
> > I'm still not clear though, exactly how to ensure these files can be seen by gluster though.
> > 
> > Ie, Franco, I'm not sure exactly what you mean when you say the files will be "out of place", and that a rebalance will take time. Directly filling the bricks up on our new server should actually bring the used/available space ratio on this server close to what it is on our other 2 nodes, so, when these files are found, somehow, by gluster, or during a rebalance, I don’t think gluster would need to shift much data between the nodes just to even out the free space.
> > 
> 
> This is the way I understand it, Gluster devs feel free to jump in at 
> any time...
> 
> Gluster uses a hash it calculates from the file name and the number of 
> bricks to distribute files evenly across the bricks. If you add a 
> brick and create new directories, any new file might get written to 
> the new brick, just depends on the hash.
> 
> The brick layout is stored in an xattr attached to directories, for 
> existing directories this will only get updated if you run a 
> fix-layout, otherwise old directories retain the old brick layout - ie 
> before you added the new brick. Files added to those old directories 
> will not go the the new brick unless you run fix-layout.
> 
> If you run a full rebalance after adding a brick, Gluster will move 
> any files that are not currently on the brick pointed to by their new 
> hash, it doesn't balance based on capacity but if a brick is full, 
> Gluster will put the file on another brick and will create a link (I 
> guess this assumes there's space on the full brick to make a link??).
> 
> So by "out of place" I mean the file might not be where the hash for 
> the file says it is but that doesn't mean that Gluster wont find it, 
> it will just take a bit longer.
> 
> 
> > As I understand it, gluster at the very least requires xattrs to be set on every file, and, obviously these will be set if data is copied in via gluster, and gluster places files on the bricks. But, I'm not clear how/if/when files will become part of the gluster, if they are not explicitly copied onto a brick that is already part of a gluster volume, via a proper gluster-aware mount:
> > 
> > I guess I’m hoping for one of two things:
> > 1)	that if you add a brick with data already on it, to a gluster - that gluster will go through and set the xattrs on all the files, and make them available, as part of the process of adding the brick. Or, 
> > 2)	that there is some way to trigger gluster to re-scan a brick, to make itself aware of files that have been copied in “behind” the gluster.
> > 
> 
> No Gluster doesn't scan unless you try to access a file, it's only 
> then that it goes through the process of first using the hash, then 
> searching directories. Once it's found the file it will add a link so 
> that next time a lookup will be faster.
> 
> > I do apologise if I'm missing the point or making this seem a lot harder than it is - it's just, when dealing with large amounts of data, we have to be certain - I can't afford to waste 2 days copying data onto the server, and then find I can't add the files to the gluster without deleting it all and then spending 5 more days transferring all the files again via gluster.
> > 
> 
> No need to apologise, getting your head around this stuff is 
> difficult, even now (after nearly 2 years) I'm still not sure I'm 
> giving you accurate information.
> 
> > Thanks again, I really do appreciate any advice that can really nail this down and clarify the situation.
> > 
> 
> No problem.
> 
> > -----Original Message-----
> > From: Franco Broi [mailto:franco.broi@xxxxxxxxxx]
> > Sent: Thursday, 16 October 2014 12:21 PM
> > To: SINCOCK John; gluster-users
> > Subject: Re:  Is it ok to add a new brick with files already on it?
> > 
> > 
> > Gluster may be slow when creating lots of small files but it is not slow writing.
> > 
> > I don't see a problem with what you want to do as long as you realise that many of the files will be out of place and a future rebalance would take a very long time - if you decide to run one.
> > 
> > On Wed, 2014-10-15 at 21:12 -0500, Ryan Nix wrote: 
> > > Interesting.  Still, I think its better to let the Gluster client 
> > > handle the syncing.  What happens if, for some strange reason, the 
> > > rsync process dies in the middle of the night?  Gluster, on the 
> > > other, will keep working to get the data on the other bricks 
> > > without human intervention.  I recently used Gluster to sync 3 TBs 
> > > of data to the another brick over a 1Gbps link in about 13 hours on decent hardware.
> > > 
> > > On Wed, Oct 15, 2014 at 9:04 PM, SINCOCK John 
> > > <J.Sincock@xxxxxxxxx>
> > > wrote:
> > >          
> > >         
> > >         We have 20 Terabytes to rsync onto a new server (which will
> > >         have 32 TB capacity),
> > >         
> > >         And we then want to add that server to an existing 2-node
> > >         gluster of 73TB (53 TB used, 20 TB free), to give a 3-node
> > >         gluster with 105TB capacity, 73TB used.
> > >         
> > >          
> > >         
> > >         The reason I want to do it this way, if possible, is that
> > >         Gluster is slow on writes, especially for small files, and we
> > >         have a LOT of small files, so I’m pretty sure it will be  LOT
> > >         faster to rsync directly to the new server (which is the one
> > >         that has free space anyway), and then add that server to the
> > >         gluster – if it is possible to have gluster recognise those
> > >         files.
> > >         
> > >          
> > >         
> > >          
> > >         
> > >         From: Ryan Nix [mailto:ryan.nix@xxxxxxxxx] 
> > >         Sent: Thursday, 16 October 2014 11:58 AM
> > >         To: SINCOCK John
> > >         Cc: Franco Broi; gluster-users
> > >         
> > >         
> > >         Subject: Re:  Is it ok to add a new brick with
> > >         files already on it? 
> > >          
> > >         
> > >         So Gluster, at its core, uses rsync to copy the data to the
> > >         other bricks.  Why not let Gluster do the heavy lifting?
> > >         
> > >         
> > >          
> > >         
> > >         On Wed, Oct 15, 2014 at 7:35 PM, SINCOCK John
> > >         <J.Sincock@xxxxxxxxx> wrote:
> > >         
> > >         
> > >         In a related question... it seems, if it is possible to add
> > >         filesystems already containing data, as new bricks, then it
> > >         should also be possible to:
> > >         
> > >         1) create empty bricks
> > >         2) add them to the gluster volume while they are empty
> > >         3) rsync data directly onto the underlying empty bricks,
> > >         circumventing gluster, ie not through the gluster mountpoint
> > >         4) somehow get gluster to recognise the data that has been
> > >         copied into the bricks?
> > >         
> > >         How would you go about getting gluster to see the data you've
> > >         rsynced directly in?
> > >         My concern would be that all the data rsynced directly onto
> > >         the bricks will just sit there, invisible to glusterfs.
> > >         
> > >         Thanks again for any info!
> > >         
> > >         
> > >         -----Original Message-----
> > >         From: Franco Broi [mailto:franco.broi@xxxxxxxxxx]
> > >         Sent: Thursday, 16 October 2014 10:06 AM
> > >         To: SINCOCK John
> > >         Cc: gluster-users@xxxxxxxxxxx
> > >         Subject: Re:  Is it ok to add a new brick with
> > >         files already on it?
> > >         
> > >         
> > >         
> > >         I've never added a brick with existing files but I did start a
> > >         new Gluster volume on disks that already contained data and I
> > >         was able to access the files without problem. Of course the
> > >         files will be out of place but the first time you access them,
> > >         Gluster will add links to speed up future lookups.
> > >         
> > >         On Thu, 2014-10-16 at 09:57 +1030, SINCOCK John wrote:
> > >         > Hi Everyone,
> > >         >
> > >         >
> > >         >
> > >         > All the instructions I’ve been able to find on adding a
> > >         brick to a
> > >         > gluster, seem to assume the brick is empty when it’s added.
> > >         >
> > >         >
> > >         >
> > >         > So my question is, is it possible for a new brick, loaded up
> > >         with
> > >         > files, to be added to a gluster (and for all the files
> > >         already on that
> > >         > brick, to be indexed and added into the gluster). Apologies
> > >         if the
> > >         > question is answered elsewhere, but I couldn’t find anyone
> > >         addressing
> > >         > this specific question, and certainty helps when you’re
> > >         dealing with
> > >         > 10’s of terabytes of data... ;-)
> > >         >
> > >         >
> > >         >
> > >         > Thanks in advance for any info or tips!
> > >         >
> > >         >
> > >         >
> > >         >
> > >         > _______________________________________________
> > >         > Gluster-users mailing list
> > >         > Gluster-users@xxxxxxxxxxx
> > >         >
> > >         
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >         
> > >         
> > >         _______________________________________________
> > >         Gluster-users mailing list
> > >         Gluster-users@xxxxxxxxxxx
> > >         
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >         
> > >         
> > >          
> > >         
> > >         
> > > 
> > > 
> > 
> > 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux