Re: complete f......p thanks to glusterfs...applause, you crashed weeks of work

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Mon, 01 Sep 2014 22:18:52 -0700

My first suggestion, that's not phrased the very carefully chosen words I usually use in order to be sure to be read as "nice", would be to stop screwing the system design by randomly doing things to the bricks that you clearly don't understand and instead let the software do what it was designed to do.

I phrase it harshly because you started your email with hostility, blaming developers that demonstrate more talent before breakfast than you enumerate in your description below over, apparently, you're three year long experience (based on the version history).

Most of the problems you describe can be caused directly by your "fixes" and cannot be caused by resizing xfs.

More inline

On September 1, 2014 8:49:46 PM PDT, Bencho Naut <all@xxxxxxxxxxxx> wrote:
>until a few days ago my opinion about glusterfs was " working but
>stable", now i would just call it the a versatile data and time
>blackhole.
>
>Though i don't even feel like the dev's read the gluster-users list, i
>suggest you shot yourself and just do it like truecrypt ( big
>disclaimer: this software is insecure, use another product, NO
>PRODUCTION USE).
>
>It started with the usual issues ,not syncing(3.2) , shd
>fails(3.2-3.3), peer doesnt reconnect(3.3), ssl keys have to be
>2048-bit fixed size and all keys have to bey verywhere(all
>versions....which noob programmed that ??), only control connection is
>encrypted,   etc. etc. i kept calm, resynced,recreated, already gave
>up.. VERY OFTEN..
>
>At a certain point it also used tons of diskspace due to not deleting
>files in the ".glusterfs" directory , (but still being connected and up
>serving volumes)
>
>IT WAS A LONG AND PAINFUL SYNCING PROCESS until i thought i was happy
>;)
>
>But now the master-fail happened:
>(and i already know you can't pop out a simple solution, but yeah come,
>write your mess.. i'll describe it for you)
>
>Due to an Online-resizing lvm/XFS glusterfs (i watch the logs nearly
>all the time) i discovered "mismacthing disk layouts" , realizing also
>that 

Mismatching layouts comes from having two fully populated servers with no xattrs as you describe creating later. 

>
>server1 was up and happy when you mount from it, but server2 spew
>input/output errors on several directories (for now just in that
>volume),

Illogical as there should be no difference in operation regardless of which server provides the client configuration. Did you somehow get the vols tree out of sync between servers? 

>
>i tried to rename one directory, it created a recursive loop inside XFS
>(e.g. BIGGEST FILE-SYSTEM FAIL : TWO INODES linking to one dir ,
>ideally containing another)
>i got at least the XFS loop solved.
>
>Then the pre-last resort option came up.. deleted the volumes, cleaned
>all xattr on that ~2T ... and recreated the volumes, since shd seems to
>work somehow since 3.4
>guess what happened ?? i/o errors on server2 on and on , before i could
>mount on server1 from server 2 without i/o errors..not now..
>
>Really i would like to love this project, but right now i'm in the mood
>for a killswitch (for the whole project), the aim is good, the way

I'm sure you can get a refund for the software. 

>glusterfs tries to achieve this is just poor..tons of senseless logs,

Perhaps you're getting "tons"  because you've already broken it. I don't great too much in my logs. 

>really , even your worst *insertBigCorp* DB server will spit less logs,

Absolutely. In fact I think the bigger the Corp, the smaller and more obfuscated the logs. I guess that's a good thing? 

>glusterfs in the default setting is just eating your diskspace with
>logs, there is no option to rate-limit , everytime you start a volume
>it logs the volume config... sometimes i feel like git would be the way
>to go, not only for the logs (git-annex ;) ) .
>
>now i realized through "ls -R 1>/dev/null" that this happend on ALL
>volumes in the cluster, an known problem "can't stat folders".
>
>Maybe anyone has a suggestion , except "create a new clean volume and
>move all your TB's" .

BTW... You never backed up your accusation that there is a "data black hole".

This can be solved. It's a complete mess though, by this account, and won't be easy. Your gfids and dht mappings may be mismatched, you may have directories and files with the same path on different bricks, and who knows what state your .glusterfs directory is in. Hang out in #gluster on irc and I'll help you as I can. I'm just getting back from vacation so I'll have a bunch of my own work to catch up on,  too. 

>
>Regards
>_______________________________________________
>Gluster-users mailing list
>Gluster-users@xxxxxxxxxxx
>http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users