On 11/06/2013 11:52 AM, Justin Dossey wrote: > Shawn, > > I had a very similar experience with a rebalance on 3.3.1, and it took > weeks to get everything straightened out. I would be happy to share > the scripts I wrote to correct the permissions issues if you wish, > though I'm not sure it would be appropriate to share them directly on > this list. Perhaps I should just create a project on Github that is > devoted to collecting scripts people use to fix their GlusterFS > environments! > > After that (awful) experience, I am loath to run further rebalances. > I've even spent days evaluating alternatives to GlusterFS, as my > experience with this list over the last six months indicates that > support for community users is minimal, even in the face of major bugs > such as the one with rebalancing and the continuing "gfid different on > subvolume" bugs with 3.3.2. I'm one of oldest GlusterFS users around here and one of the biggest proponents and even I have been loath to rebalance until 3.4.1. There are no open bugs for gfid mismatches that I could find. The last time someone mentioned that error in IRC it was 2am, I was at a convention, and I told the user how to solve that problem ( http://irclog.perlgeek.de/gluster/2013-06-14#i_7196149 ). It was caused by split-brain. If you have a bug, it would be more productive to file it rather than make negative comments about a community of people that have no requirement to help anybody, but do it anyway just because they're nice people. This is going to sound snarky because it's in text, but I mean this sincerely. If community support is not sufficient, you might consider purchasing support from a company that provides it professionally. > > Let me know what you think of the Github thing and I'll proceed > appropriately. Even better, put them up on http://forge.gluster.org > > > On Tue, Nov 5, 2013 at 9:05 PM, Shawn Heisey <gluster at elyograg.org > <mailto:gluster at elyograg.org>> wrote: > > We recently added storage servers to our gluster install, running > 3.3.1 > on CentOS 6. It went from 40TB usable (8x2 distribute-replicate) to > 80TB usable (16x2). There was a little bit over 20TB used space > on the > volume. > > The add-brick went through without incident, but the rebalance failed > after moving 1.5TB of the approximately 10TB that needed to be > moved. A > side issue is that it took four days for that 1.5TB to move. I'm > aware > that gluster has overhead, and that there's only so much speed you can > get out of gigabit, but a 100Mb/s half-duplex link could have > copied the > data faster if it had been a straight copy. > > After I discovered that the rebalance had failed, I noticed that there > were other problems. There are a small number of completely lost > files > (91 that I know about so far), a huge number of permission issues > (over > 800,000 files changed to 000), and about 32000 files that are throwing > read errors via the fuse/nfs mount but seem to be available > directly on > bricks. That last category of problem file has the sticky bit > set, with > almost all of them having ---------T permissions. The good files on > bricks typically have the same permissions, but are readable by > root. I > haven't worked out the scripting necessary to automate all the fixing > that needs to happen yet. > > We really need to know what happened. We do plan to upgrade to 3.4.1, > but there were some reasons that we didn't want to upgrade before > adding > storage. > > * Upgrading will result in service interruption to our clients, which > mount via NFS. It would likely be just a hiccup, with quick failover, > but it's still a service interruption. > * We have a pacemaker cluster providing the shared IP address for NFS > mounting. It's running CentOS 6.3. A "yum upgrade" to upgrade > gluster > will also upgrade to CentOS 6.4. The pacemaker in 6.4 is incompatible > with the pacemaker in 6.3, which will likely result in > longer-than-expected downtime for the shared IP address. > * We didn't want to risk potential problems with running gluster 3.3.1 > on the existing servers and 3.4.1 on the new servers. > * We needed the new storage added right away, before we could schedule > maintenance to deal with the upgrade issues. > > Something that would be extremely helpful would be obtaining the > services of an expert-level gluster consultant who can look over > everything we've done to see if there is anything we've done wrong and > how we might avoid problems in the future. I don't know how much the > company can authorize for this, but we obviously want it to be as > cheap > as possible. We are in Salt Lake City, UT, USA. It would be > preferable > to have the consultant be physically present at our location. > > I'm working on redacting one bit of identifying info from our > rebalance > log, then I can put it up on dropbox for everyone to examine. > > Thanks, > Shawn > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Justin Dossey > CTO, PodOmatic > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131106/46338fa8/attachment.html>