I think I expected and prefer to see that when I add a new storage nodes to my Glusterfs network, new files are stored on both new and old servers as long as they have available storage space, regardless of what folder the file is stored in. In my scenario, running a backup service, I have 999 folders on the root of my Glusterfs mountpoint. Each user is has its own folder in one of the 999 folders (we use a hash to figure out which folder the user belongs to). Then each user has 999 subfolders. Files are placed in those folders based on a hash of the filename. So you see, my folder structure is allready set and new files are stored in an existing folder, never in new folders. This means that I can't scale my Glusterfs storage capacity by adding new storage servers without processing all my files, and moving many of them, using the scale-and-defrag script. I have hundreds of TB of data, and millions of files. Running the scale-and-defrag script will take many weeks. This is a problem because the system is live, in production. What happens if the script fails mid flight. I would have to start over again. It's just too messy. And the there is a real chance that I have to add more storage servers before the script is finished. I hope this explains why this behavior is problematic. Regards Roland Rabben 2010/9/26 Craig Carl <craig at gluster.com> > Ronald - > 3.1 works the same way, what type of behavior would you prefer|expect > to see? > > Thanks. > > -- > Craig Carl > Gluster, Inc. > Cell - (408) 829-9953 (California, USA) > Gtalk - craig.carl at gmail.com > > > ------------------------------ > *From: *"Roland Rabben" <roland at jotta.no> > *To: *"Craig Carl" <craig at gluster.com> > *Cc: *gluster-users at gluster.org > *Sent: *Friday, September 24, 2010 2:01:47 AM > > *Subject: *Re: Adding new storage nodes to existing > GlusterFS network > > Oh no. This is a big problem for me. My folder structure is locked. > > I am running 3.0.5, so the messy symlink solution won't work for me, even > if I wanted to use it. > > Doing nothing means I can't scale my Glusterfs system, which kind of > defeats the purpose of a scalable distributed file system. > > Option 3 is to change file attributes for all folders and files on my > system, and copy a large portion of my files over to the new servers. I have > millions of files and folders and about 100 TB of data. This will take > weeks. What if something fails during this process? > And, I have to do it over again when I need to add more storage servers. > > Is this really it? It's not practically possible and it doesn't scale at > all. > > Does 3.1 address these issues, or would you still need to use the > scale-and-defrag script? > > Should I be using something different than the Distribute translator? > > Best Regards > > Roland Rabben > > 2010/9/24 Craig Carl <craig at gluster.com> > >> Roland - >> The behavior you are seeing now is expected in your clusters current >> state. The elastic hash algorithm assigned each folder a hash range when the >> folder was created, before you added the new servers. Unless you update this >> range after adding the new storage server you will continue to see the >> current behavior. >> >> The scale-n-defrag.sh (dependent on defrag.sh) script does two things - >> 1. Updates the hash range on each folder. >> 2. Moves any file that needs to be move to its 'correct' server. >> >> You can do 1 of 3 things at this point - (only options 1 or 3 with Gluster >> 3.0.5) >> >> 1. Nothing. >> 1a. New directories will be created across all the storage nodes and >> files in those directories will distributed across all storage servers. >> 1b. Files written to existing directories will not be distributed >> onto the new storage servers. >> >> 2. Update the hash ranges on each directory but DO NOT move any data. (NOT >> AN OPTION WITH GLUSTER 3.0.5!) >> 2a. New directories will be created across all the storage nodes and >> files in those directories will distributed across all storage servers. >> 2b. New files written to existing directories will be distributed >> onto the new storage servers. >> 2c. A link file will be created for every file that isn't on the >> 'correct' server. The link file points the Gluster to the server on which >> the file actually exists. Creating the link file takes time, the re-direct >> takes time and the additional network I/O takes time, this slows down your >> cluster. In a cluster with a lot of nodes creating the link file takes >> longer. >> 2d. Your cluster won't get any faster. If you redistribute the data >> you can take advantage of the new cache (memory) and I/O (disks) and >> bandwidth (IP). >> >> 3. Update the hash ranges on each directory and move your files. >> 3a. New directories will be created across all the storage nodes and >> files in those directories will distributed across all storage servers. >> 3b. New files written to existing directories will be distributed >> onto the new storage servers. >> 3c. Your cluster will get faster. >> 3d. You will take a performance hit while scale-n-defrag.sh is >> running. >> >> I don't recommend it, but if you want to use option 2 and you are NOT >> RUNNING 3.0.5 you just need to comment out the last 6 lines of >> scale-n-defrag.sh. If you are running 3.0.5 you must either do nothing or >> run the full scale-n-defrag. To check the version run 'glusterfs >> --version'. I hope this helps you understand the defrag process, please let >> me know if you have any other questions. >> >> Thanks, >> >> Craig >> >> -- >> Craig Carl >> Sales Engineer; Gluster, Inc. >> Cell - (408) 829-9953 (California, USA) >> Office - (408) 770-1884 >> Gtalk - craig.carl at gmail.com >> Twitter - @gluster >> Installing Gluster Storage Platform, the movie!<http://www.youtube.com/user/GlusterStorage> >> http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/ >> >> >> ------------------------------ >> *From: *"Roland Rabben" <roland at jotta.no> >> *To: *"Craig Carl" <craig at gluster.com> >> *Cc: *gluster-users at gluster.org >> *Sent: *Thursday, September 23, 2010 5:03:03 AM >> >> *Subject: *Re: Adding new storage nodes to existing >> GlusterFS network >> >> Hi Craig >> After unmounting the client, modifying my client vol file to include the >> new storage servers and mounting the volume on my client, it does not seem >> that new files written to existing folders are stored on the new servers. >> They only end up on the old servers. Is this expected? >> >> If I create a new folder and store new files here, the files are stored on >> both new and old servers. >> >> Is this where the scale-n-defrag script come into action? >> Can you please describe what it does? >> Does it move any files, or does it just update metadatainformation to >> include the new servers? >> >> All new files are written to existing folders, so I need a solution to >> this. I also have many TB of data and millions of files, so moving files >> around will take a long time. >> >> Thanks >> >> Roland Rabben >> >> 2010/9/22 Craig Carl <craig at gluster.com> >> >>> Roland - >>> You can find the *scale-n-defrag* script here - >>> http://ftp.gluster.com/pub/gluster/glusterfs/misc/defrag/, be sure to >>> edit the script first, instructions are inline. >>> >>> Please let us know if you have any other questions. >>> >>> Thanks, >>> >>> Craig >>> >>> -- >>> Craig Carl >>> Sales Engineer; Gluster, Inc. >>> Cell - (408) 829-9953 (California, USA) >>> Office - (408) 770-1884 >>> Gtalk - craig.carl at gmail.com >>> Twitter - @gluster >>> Installing Gluster Storage Platform, the movie!<http://www.youtube.com/user/GlusterStorage> >>> >>> http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/ >>> >>> >>> >>> ------------------------------ >>> *From: *"Roland Rabben" <roland at jotta.no> >>> >>> *To: *gluster-users at gluster.org >>> *Sent: *Wednesday, September 22, 2010 7:48:23 AM >>> >>> *Subject: *Re: Adding new storage nodes to existing >>> GlusterFS network >>> >>> Hi James, >>> thanks for the answer I will try this tomorrow. I am adding the new >>> servers >>> for capacity increase. >>> >>> Do you have any idea how to rebalance old files? >>> >>> Regards >>> Roland Rabben >>> >>> 2010/9/22 Burnash, James <jburnash at knight.com> >>> >>> > Hi Roland. >>> > >>> > The short answer is - I'm not sure because I'm in the midst of doing >>> this >>> > myself, but my setup is just Replicated. >>> > >>> > I believe that you can do what you said with the clients because they >>> are >>> > the only entities with knowledge of what servers are in the backend, so >>> > adding new servers to their configs and restarting those clients should >>> work >>> > just fine, assuming you get the replication/distributed part of their >>> > configs correct. >>> > >>> > The one thing is, from what I understand, no rebalancing of old files >>> will >>> > take place on the new servers automatically - that's a manual procedure >>> - >>> > but any new files written by the clients will be hashed out to all >>> servers - >>> > including the new ones. >>> > >>> > Just for my information - did you add the extra servers for capacity / >>> > redundancy / performance increases? >>> > >>> > James Burnash, Unix Engineering >>> > >>> > -----Original Message----- >>> > From: gluster-users-bounces at gluster.org [mailto: >>> > gluster-users-bounces at gluster.org] On Behalf Of Roland Rabben >>> > Sent: Wednesday, September 22, 2010 7:37 AM >>> > To: gluster-users at gluster.org >>> > Subject: Re: Adding new storage nodes to existing >>> GlusterFS >>> > network >>> > >>> > Anyone who know this? >>> > >>> > Regards >>> > Roland Rabben >>> > >>> > 2010/9/21 Roland Rabben <roland at jotta.no> >>> > >>> > > Hi, this is probably a newbee question, but here goes. >>> > > >>> > > I am adding two new servers to my existing GlusterFS network and I am >>> > > wondering what the correct procedure is. >>> > > I am using a Distributed / Replicated setup. My existing network has >>> > > two servers. >>> > > >>> > > Do I need to take down the whole network with client and servers? >>> > > Can I just unmount the client, update the client config file and then >>> > > mount the client again with the new servers? >>> > > Will new files be written to the new servers only, or both the new >>> and >>> > old? >>> > > >>> > > Best regards >>> > > >>> > > Roland Rabben >>> > > Founder & CEO Jotta AS >>> > > Cell: +47 90 85 85 39 >>> > > Phone: +47 21 04 29 00 >>> > > Email: roland at jotta.no >>> > > >>> > >>> > >>> > >>> > -- >>> > Roland Rabben >>> > Founder & CEO Jotta AS >>> > Cell: +47 90 85 85 39 >>> > Phone: +47 21 04 29 00 >>> > Email: roland at jotta.no >>> > >>> > >>> > DISCLAIMER: >>> > This e-mail, and any attachments thereto, is intended only for use by >>> the >>> > addressee(s) named herein and may contain legally privileged and/or >>> > confidential information. If you are not the intended recipient of this >>> > e-mail, you are hereby notified that any dissemination, distribution or >>> > copying of this e-mail, and any attachments thereto, is strictly >>> prohibited. >>> > If you have received this in error, please immediately notify me and >>> > permanently delete the original and any copy of any e-mail and any >>> printout >>> > thereof. E-mail transmission cannot be guaranteed to be secure or >>> > error-free. The sender therefore does not accept liability for any >>> errors or >>> > omissions in the contents of this message which arise as a result of >>> e-mail >>> > transmission. >>> > NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, >>> at >>> > its discretion, monitor and review the content of all e-mail >>> communications. >>> > http://www.knight.com >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> > >>> >>> >>> >>> -- >>> Roland Rabben >>> Founder & CEO Jotta AS >>> Cell: +47 90 85 85 39 >>> Phone: +47 21 04 29 00 >>> Email: roland at jotta.no >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> >> >> >> >> -- >> Roland Rabben >> Founder & CEO Jotta AS >> Cell: +47 90 85 85 39 >> Phone: +47 21 04 29 00 >> Email: roland at jotta.no >> > > > > -- > Roland Rabben > Founder & CEO Jotta AS > Cell: +47 90 85 85 39 > Phone: +47 21 04 29 00 > Email: roland at jotta.no > -- Roland Rabben Founder & CEO Jotta AS Cell: +47 90 85 85 39 Phone: +47 21 04 29 00 Email: roland at jotta.no