Joe, You're right-- I probably should have dialed it back a bit! It's frustrating sometimes when I post about such a major issue and never see any reply. In my case, I run into gfid bugs regularly, almost always in situations where I have copied an entire directory tree into a GlusterFS mount. There have been no connectivity issues between nodes, no node restarts, etc, for months, but once in a while, I get a gfid mismatch and must manually correct the situation. I would certainly purchase GlusterFS support if I had any option other than Red Hat-- they only support Red Hat Storage and that isn't a good fit for my environment at this time. If GlusterFS is successful the way it could be, there will definitely be an opportunity for a firm to support it on non-RedHat platforms. FWIW, I've created a Github repo to store my scripts for navigating GlusterFS issues. If they remain relevant and the repo gets activity, I'll go to Gluster Forge. https://github.com/justindossey/gluster-scripts On Wed, Nov 6, 2013 at 12:15 PM, Joe Julian <joe at julianfamily.org> wrote: > On 11/06/2013 11:52 AM, Justin Dossey wrote: > > Shawn, > > I had a very similar experience with a rebalance on 3.3.1, and it took > weeks to get everything straightened out. I would be happy to share the > scripts I wrote to correct the permissions issues if you wish, though I'm > not sure it would be appropriate to share them directly on this list. > Perhaps I should just create a project on Github that is devoted to > collecting scripts people use to fix their GlusterFS environments! > > After that (awful) experience, I am loath to run further rebalances. > I've even spent days evaluating alternatives to GlusterFS, as my > experience with this list over the last six months indicates that support > for community users is minimal, even in the face of major bugs such as the > one with rebalancing and the continuing "gfid different on subvolume" bugs > with 3.3.2. > > I'm one of oldest GlusterFS users around here and one of the biggest > proponents and even I have been loath to rebalance until 3.4.1. > > There are no open bugs for gfid mismatches that I could find. The last > time someone mentioned that error in IRC it was 2am, I was at a convention, > and I told the user how to solve that problem ( > http://irclog.perlgeek.de/gluster/2013-06-14#i_7196149 ). It was caused > by split-brain. If you have a bug, it would be more productive to file it > rather than make negative comments about a community of people that have no > requirement to help anybody, but do it anyway just because they're nice > people. > > This is going to sound snarky because it's in text, but I mean this > sincerely. If community support is not sufficient, you might consider > purchasing support from a company that provides it professionally. > > > > Let me know what you think of the Github thing and I'll proceed > appropriately. > > Even better, put them up on http://forge.gluster.org > > > > > On Tue, Nov 5, 2013 at 9:05 PM, Shawn Heisey <gluster at elyograg.org> wrote: > >> We recently added storage servers to our gluster install, running 3.3.1 >> on CentOS 6. It went from 40TB usable (8x2 distribute-replicate) to >> 80TB usable (16x2). There was a little bit over 20TB used space on the >> volume. >> >> The add-brick went through without incident, but the rebalance failed >> after moving 1.5TB of the approximately 10TB that needed to be moved. A >> side issue is that it took four days for that 1.5TB to move. I'm aware >> that gluster has overhead, and that there's only so much speed you can >> get out of gigabit, but a 100Mb/s half-duplex link could have copied the >> data faster if it had been a straight copy. >> >> After I discovered that the rebalance had failed, I noticed that there >> were other problems. There are a small number of completely lost files >> (91 that I know about so far), a huge number of permission issues (over >> 800,000 files changed to 000), and about 32000 files that are throwing >> read errors via the fuse/nfs mount but seem to be available directly on >> bricks. That last category of problem file has the sticky bit set, with >> almost all of them having ---------T permissions. The good files on >> bricks typically have the same permissions, but are readable by root. I >> haven't worked out the scripting necessary to automate all the fixing >> that needs to happen yet. >> >> We really need to know what happened. We do plan to upgrade to 3.4.1, >> but there were some reasons that we didn't want to upgrade before adding >> storage. >> >> * Upgrading will result in service interruption to our clients, which >> mount via NFS. It would likely be just a hiccup, with quick failover, >> but it's still a service interruption. >> * We have a pacemaker cluster providing the shared IP address for NFS >> mounting. It's running CentOS 6.3. A "yum upgrade" to upgrade gluster >> will also upgrade to CentOS 6.4. The pacemaker in 6.4 is incompatible >> with the pacemaker in 6.3, which will likely result in >> longer-than-expected downtime for the shared IP address. >> * We didn't want to risk potential problems with running gluster 3.3.1 >> on the existing servers and 3.4.1 on the new servers. >> * We needed the new storage added right away, before we could schedule >> maintenance to deal with the upgrade issues. >> >> Something that would be extremely helpful would be obtaining the >> services of an expert-level gluster consultant who can look over >> everything we've done to see if there is anything we've done wrong and >> how we might avoid problems in the future. I don't know how much the >> company can authorize for this, but we obviously want it to be as cheap >> as possible. We are in Salt Lake City, UT, USA. It would be preferable >> to have the consultant be physically present at our location. >> >> I'm working on redacting one bit of identifying info from our rebalance >> log, then I can put it up on dropbox for everyone to examine. >> >> Thanks, >> Shawn >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Justin Dossey > CTO, PodOmatic > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > -- Justin Dossey CTO, PodOmatic -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131106/8499f64f/attachment.html>