On 17/08/11 16:19, Dan Bretherton wrote: > >> >> >> >> Dan Bretherton wrote: >>> >>> On 15/08/11 20:00, gluster-users-request at gluster.org wrote: >>>> Message: 1 >>>> Date: Sun, 14 Aug 2011 23:24:46 +0300 >>>> From: "Deyan Chepishev - SuperHosting.BG"<dchepishev at superhosting.bg> >>>> Subject: cluster.min-free-disk separate for each >>>> brick >>>> To: gluster-users at gluster.org >>>> Message-ID:<4E482F0E.3030604 at superhosting.bg> >>>> Content-Type: text/plain; charset=UTF-8; format=flowed >>>> >>>> Hello, >>>> >>>> I have a gluster set up with very different brick sizes. >>>> >>>> brick1: 9T >>>> brick2: 9T >>>> brick3: 37T >>>> >>>> with this configuration if I set the parameter >>>> cluster.min-free-disk to 10% it >>>> applies to all bricks which is quite uncomfortable with these brick >>>> sizes, >>>> because 10% for the small bricks are ~ 1T but for the big brick it >>>> is ~3.7T and >>>> what happens at the end is that if all brick go to 90% usage and I >>>> continue >>>> writing, the small ones eventually fill up to 100% while the big >>>> one has enough >>>> free space. >>>> >>>> My question is, is there a way to set cluster.min-free-disk per >>>> brick instead >>>> setting it for the entire volume or any other way to work around >>>> this problem ? >>>> >>>> Thank you in advance >>>> >>>> Regards, >>>> Deyan >>>> >>> Hello Deyan, >>> >>> I have exactly the same problem and I have asked about it before - >>> see links below. >>> >>> http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/ >>> >>> http://gluster.org/pipermail/gluster-users/2011-May/007788.html >>> >>> My understanding is that the patch referred to in Amar's reply in >>> the May thread prevents a "migrate-data" rebalance operation failing >>> by running out of space on smaller bricks, but that doesn't solve >>> the problem we are having. Being able to set min-free-disk for each >>> brick separately would be useful, as would being able to set this >>> value as a number of bytes rather than a percentage. However, even >>> if these features were present we would still have a problem when >>> the amount of free space becomes less than min-free-disk, because >>> this just results in a warning message in the logs and doesn't >>> actually prevent more files from being written. In other words, >>> min-free-disk is a soft limit rather than a hard limit. When a >>> volume is more than 90% full there may still be hundreds of >>> gigabytes of free space spread over the large bricks, but the small >>> bricks may each only have a few gigabytes left of even less. Users >>> do "df" and see lots of free space in the volume so they continue >>> writing files. However, when GlusterFS chooses to write a file to a >>> small brick, the write fails with "device full" errors if the file >>> grows too large, which is often the case here with files typically >>> several gigabytes in size for some applications. >>> >>> I would really like to know if there is a way to make min-free-disk >>> a hard limit. Ideally, GlusterFS would chose a brick on which to >>> write a file based on how much free space it has left rather than >>> choosing a brick at random (or however it is done now). That would >>> solve the problem of non-uniform brick sizes without the need for a >>> hard min-free-disk limit. >>> >>> Amar's comment in the May thread about QA testing being done only on >>> volumes with uniform brick sizes prompted me to start standardising >>> on a uniform brick size for each volume in my cluster. My >>> impression is that implementing the features needed for users with >>> non-uniform brick sizes is not a priority for Gluster, and that >>> users are all expected to use uniform brick sizes. I really think >>> this fact should be stated clearly in the GlusterFS documentation, >>> in the sections on creating volumes in the Administration Guide for >>> example. That would stop other users from going down the path that >>> I did initially, which has given me a real headache because I am now >>> having to move tens of terabytes of data off bricks that are larger >>> than the new standard size. >>> >>> Regards >>> Dan. >>> >> Hello, >> >> This is really bad news, because I already migrated my data and I >> just realized that I am screwed because Gluster just does not care >> about the brick sizes. >> It is impossible to move to uniform brick sizes. >> >> Currently we use 2TB HDDs, but the disks are growing and soon we >> will probably use 3TB hdds or whatever other larges sizes appear on >> the market. So if we choose to use raid5 and some level of redundancy >> (for example 6hdds in raid5, no matter what their size is) this >> sooner or later will lead us to non uniform bricks which is a problem >> and it is not correct to expect that we always can or want to provide >> uniform size bricks. >> >> With this way of thinking if we currently have 10T from 6x2T in hdd5, >> at some point when there is a 10T on a single disk we will have to >> use no raid just because gluster can not handle non uniform bricks. >> >> Regards, >> Deyan >> > > I think Amar might have provided the answer in his posting to the > thread yesterday, which has just appeared in my autospam folder. > > http://gluster.org/pipermail/gluster-users/2011-August/008579.html > >> With size option, you can have a hardbound on min-free-disk > This means that you can set a hard limit on min-free-disk, and set a > value in GB that is bigger than the biggest file that is ever likely > to be written. This looks likely to solve our problem and make > non-uniform brick sizes a practical proposition. I wish I had known > about this back in May when I embarked on my cluster restructuring > exercise; the issue was discussed in this thread in May as well: > http://gluster.org/pipermail/gluster-users/2011-May/007794.html > > Once I have moved all the data off the large bricks and standardised > on a uniform brick size, it will be relatively easy to stick to this > because I use LVM. I create logical volumes for new bricks when a > volume needs extending. The only problem with this approach is what > happens when the amount of free space left on a server is less than > the size of the brick you want to create. The only option then would > be to use new servers, potentially wasting several TB of free space on > existing servers. The standard brick size for most of my volumes is > 3TB, which allows me to use a mixture of small servers and large > servers in a volume and limits the amount of free space that would be > wasted if there wasn't quite enough free space on a server to create > another brick. Another consequence of having 3TB bricks is that a > single server typically has two more more bricks belonging to a the > same volume, although I do my best to distribute the volumes across > different servers in order to spread the load. I am not aware of any > problems associated with exporting multiple bricks from a single > server and it has not caused me any problems so far that I am aware of. > > -Dan. > Hello Deyan, Have you tried giving min-free-disk a value in gigabytes, and if so does it prevent new files being written to your bricks when they are nearly full? I recently tried it myself and found that min-free-disk had no effect all. I deliberately filled my test/backup volume and most of the bricks became 100 full. I set min-free-disk to "20GB", as reported in "gluster volume ... info" below. cluster.min-free-disk: 20GB Unless I am doing something wrong it seems as though we can not "have a hardbound on min-free-disk" after all, and uniform brick size is therefore an essential requirement. It still doesn't say that in the documentation, at least not in the volume creation sections. -Dan. -- Mr. D.A. Bretherton Computer System Manager Environmental Systems Science Centre Harry Pitt Building 3 Earley Gate University of Reading Reading, RG6 6AL UK Tel. +44 118 378 5205 Fax: +44 118 378 6413