On 29/09/11 12:28, Dan Bretherton wrote: > > On 08/09/11 23:51, Dan Bretherton wrote: >> >>> On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton >>> <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>> >>> wrote: >>> >>> >>> On 17/08/11 16:19, Dan Bretherton wrote: >>> >>> >>> >>> >>> >>> Dan Bretherton wrote: >>> >>> >>> On 15/08/11 20:00, gluster-users-request at gluster.org >>> <mailto:gluster-users-request at gluster.org> wrote: >>> >>> Message: 1 >>> Date: Sun, 14 Aug 2011 23:24:46 +0300 >>> From: "Deyan Chepishev - >>> SuperHosting.BG"<dchepishev at superhosting.bg >>> <mailto:dchepishev at superhosting.bg>> >>> Subject: cluster.min-free-disk >>> separate for each >>> brick >>> To: gluster-users at gluster.org >>> <mailto:gluster-users at gluster.org> >>> Message-ID:<4E482F0E.3030604 at superhosting.bg >>> <mailto:4E482F0E.3030604 at superhosting.bg>> >>> Content-Type: text/plain; charset=UTF-8; >>> format=flowed >>> >>> Hello, >>> >>> I have a gluster set up with very different >>> brick sizes. >>> >>> brick1: 9T >>> brick2: 9T >>> brick3: 37T >>> >>> with this configuration if I set the parameter >>> cluster.min-free-disk to 10% it >>> applies to all bricks which is quite >>> uncomfortable with these brick sizes, >>> because 10% for the small bricks are ~ 1T but >>> for the big brick it is ~3.7T and >>> what happens at the end is that if all brick go >>> to 90% usage and I continue >>> writing, the small ones eventually fill up to >>> 100% while the big one has enough >>> free space. >>> >>> My question is, is there a way to set >>> cluster.min-free-disk per brick instead >>> setting it for the entire volume or any other >>> way to work around this problem ? >>> >>> Thank you in advance >>> >>> Regards, >>> Deyan >>> >>> Hello Deyan, >>> >>> I have exactly the same problem and I have asked >>> about it before - see links below. >>> >>> http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/ >>> >>> http://gluster.org/pipermail/gluster-users/2011-May/007788.html >>> >>> My understanding is that the patch referred to in >>> Amar's reply in the May thread prevents a >>> "migrate-data" rebalance operation failing by >>> running out of space on smaller bricks, but that >>> doesn't solve the problem we are having. Being able >>> to set min-free-disk for each brick separately would >>> be useful, as would being able to set this value as >>> a number of bytes rather than a percentage. >>> However, even if these features were present we >>> would still have a problem when the amount of free >>> space becomes less than min-free-disk, because this >>> just results in a warning message in the logs and >>> doesn't actually prevent more files from being >>> written. In other words, min-free-disk is a soft >>> limit rather than a hard limit. When a volume is >>> more than 90% full there may still be hundreds of >>> gigabytes of free space spread over the large >>> bricks, but the small bricks may each only have a >>> few gigabytes left of even less. Users do "df" and >>> see lots of free space in the volume so they >>> continue writing files. However, when GlusterFS >>> chooses to write a file to a small brick, the write >>> fails with "device full" errors if the file grows >>> too large, which is often the case here with files >>> typically several gigabytes in size for some >>> applications. >>> >>> I would really like to know if there is a way to >>> make min-free-disk a hard limit. Ideally, GlusterFS >>> would chose a brick on which to write a file based >>> on how much free space it has left rather than >>> choosing a brick at random (or however it is done >>> now). That would solve the problem of non-uniform >>> brick sizes without the need for a hard >>> min-free-disk limit. >>> >>> Amar's comment in the May thread about QA testing >>> being done only on volumes with uniform brick sizes >>> prompted me to start standardising on a uniform >>> brick size for each volume in my cluster. My >>> impression is that implementing the features needed >>> for users with non-uniform brick sizes is not a >>> priority for Gluster, and that users are all >>> expected to use uniform brick sizes. I really think >>> this fact should be stated clearly in the GlusterFS >>> documentation, in the sections on creating volumes >>> in the Administration Guide for example. That would >>> stop other users from going down the path that I did >>> initially, which has given me a real headache >>> because I am now having to move tens of terabytes of >>> data off bricks that are larger than the new >>> standard size. >>> >>> Regards >>> Dan. >>> >>> Hello, >>> >>> This is really bad news, because I already migrated my >>> data and I just realized that I am screwed because >>> Gluster just does not care about the brick sizes. >>> It is impossible to move to uniform brick sizes. >>> >>> Currently we use 2TB HDDs, but the disks are growing >>> and soon we will probably use 3TB hdds or whatever other >>> larges sizes appear on the market. So if we choose to >>> use raid5 and some level of redundancy (for example >>> 6hdds in raid5, no matter what their size is) this >>> sooner or later will lead us to non uniform bricks which >>> is a problem and it is not correct to expect that we >>> always can or want to provide uniform size bricks. >>> >>> With this way of thinking if we currently have 10T from >>> 6x2T in hdd5, at some point when there is a 10T on a >>> single disk we will have to use no raid just because >>> gluster can not handle non uniform bricks. >>> >>> Regards, >>> Deyan >>> >>> >>> I think Amar might have provided the answer in his posting >>> to the thread yesterday, which has just appeared in my >>> autospam folder. >>> >>> http://gluster.org/pipermail/gluster-users/2011-August/008579.html >>> >>> With size option, you can have a hardbound on min-free-disk >>> >>> This means that you can set a hard limit on min-free-disk, >>> and set a value in GB that is bigger than the biggest file >>> that is ever likely to be written. This looks likely to >>> solve our problem and make non-uniform brick sizes a >>> practical proposition. I wish I had known about this back >>> in May when I embarked on my cluster restructuring exercise; >>> the issue was discussed in this thread in May as well: >>> http://gluster.org/pipermail/gluster-users/2011-May/007794.html >>> >>> Once I have moved all the data off the large bricks and >>> standardised on a uniform brick size, it will be relatively >>> easy to stick to this because I use LVM. I create logical >>> volumes for new bricks when a volume needs extending. The >>> only problem with this approach is what happens when the >>> amount of free space left on a server is less than the size >>> of the brick you want to create. The only option then would >>> be to use new servers, potentially wasting several TB of >>> free space on existing servers. The standard brick size for >>> most of my volumes is 3TB, which allows me to use a mixture >>> of small servers and large servers in a volume and limits >>> the amount of free space that would be wasted if there >>> wasn't quite enough free space on a server to create another >>> brick. Another consequence of having 3TB bricks is that a >>> single server typically has two more more bricks belonging >>> to a the same volume, although I do my best to distribute >>> the volumes across different servers in order to spread the >>> load. I am not aware of any problems associated with >>> exporting multiple bricks from a single server and it has >>> not caused me any problems so far that I am aware of. >>> >>> -Dan. >>> >>> Hello Deyan, >>> >>> Have you tried giving min-free-disk a value in gigabytes, and if >>> so does it prevent new files being written to your bricks when >>> they are nearly full? I recently tried it myself and found that >>> min-free-disk had no effect all. I deliberately filled my >>> test/backup volume and most of the bricks became 100 full. I >>> set min-free-disk to "20GB", as reported in "gluster volume ... >>> info" below. >>> >>> cluster.min-free-disk: 20GB >>> >>> Unless I am doing something wrong it seems as though we can not >>> "have a hardbound on min-free-disk" after all, and uniform brick >>> size is therefore an essential requirement. It still doesn't >>> say that in the documentation, at least not in the volume >>> creation sections. >>> >>> >>> -Dan. >>> >>> On 08/09/11 06:35, Raghavendra Bhat wrote: >>> > This is how it is supposed to work. >>> > >>> > Suppose a distribute volume is created with 2 bricks. 1st brick is >>> having 25GB of free space, 2nd disk has 35 GB of free space. If one >>> sets a 30GB of minimum-free-disk through volume set (gluster volume >>> set <volname> min-free-disk 30GB), then whenever files are created, >>> if the file is hashed to the 1st brick (which has 25GB of free >>> space), then actual file will be created in the 2nd brick to which a >>> linkfile will be created in the 1st brick. So the linkfile points to >>> the actual file. A warning message indicating minimum free disk >>> limit has been crosses and adding more nodes will be printed in the >>> glusterfs log file. So any file which is hashed to the 1st brick >>> will be created in the 2nd brick. >>> > >>> > Once the free space of 2nd brick also comes below 30 GB, then the >>> files will be created in the respective hashed bricks only. There >>> will be a warning message in the log file about the 2nd brick also >>> crossing the minimum free disk limit. >>> > >>> > Regards, >>> > Raghavendra Bhat >>> >> Dear Raghavendra, >> Thanks for explaining this to me. This mechanism should allow a >> volume to function correctly with non-uniform brick sizes even though >> min-free-disk is not a hard limit. I can understand now why I had so >> many problems with the default value of 10% for min-free-disk. 10% >> of a large brick can be very large compared to 10% of a small brick, >> so when they started filling up at the same rate after all had less >> than 10% free space the small bricks usually filled up long before >> large ones, giving "device full" errors even when df still showed a >> lot of free space in the volume. At least now we can minimise this >> effect by setting min-free-disk to a value in GB. >> >> -Dan. >> > Dear Raghavendra, > Unfortunately I am still having problems with some bricks filling up > completely, despite having "cluster.min-free-disk: 20GB". In one case > I am still seeing warnings about bricks being nearly full in > percentage terms in the client logs, so I am wondering if the volume > is still using cluster.min-free-disk: 10%, and ignoring the 20GB > setting I changed it to. When I changed cluster.min-free-disk should > this have taken effect immediately is there something else I should > have done to activate the change? > > In your example above, suppose there are 9 bricks instead of 2 bricks > (as in my volume), and they all have less than 30GB free space except > for one which is nearly empty, is GlusterFS clever enough to find that > nearly empty brick every time when creating new files? I expected all > new files to be created in my nearly empty brick but that has not > happened. Some files have gone in there but most have gone to nearly > full bricks, one of which has now filled up completely. I have done > rebalance...fix-layout a number of times. What can I do to fix this > problem? The volumes with one or more full bricks are unusable > because users are getting "device full" errors for some writes even > though both volumes are showing several TB free space. > > Regards > -Dan Bretherton. Dear All, If anyone is interested, I managed to produce the expected behaviour by setting min-free-disk to 300GB rather than 30GB. 300GB is is approximately 10% of the size of most of the bricks in the volume. I don't understand why setting min-free-disk to 30GB (about 1% of the brick) didn't work; maybe it is too close to the limit for some reason. I wonder if the default value of min-free-disk=10% is significant. It seems that for non-uniform brick sizes, the correct approach is to set min-free-disk to a value in GB that is approximately 10% of the brick size in each case. -Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20111119/54159810/attachment-0001.htm>