Re: Sharding?

Cedric Lemarchand <yipikai7@xxxxxxxxx> · Fri, 10 Mar 2017 14:20:59 +0100

On 10 Mar 2017, at 12:05, Krutika Dhananjay <kdhananj@xxxxxxxxxx> wrote:

On Fri, Mar 10, 2017 at 4:09 PM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:

> On 10 Mar 2017, at 10:33, Alessandro Briosi <ab1@xxxxxxxxxxx> wrote:
>
> Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
>>> I haven't done any test yet, but I was under the impression that
>>> sharding feature isn't so stable/mature yet.
>>> In the remote of my mind I remember reading something about a
>>> bug/situation which caused data corruption.
>>> Can someone confirm that sharding is stable enough to be used in
>>> production and won't cause any data loss?
>> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
>> later versions) it works well as long as you don't try to add new bricks
>> to your volumes (we use it in production for HA virtual machine disks).
>> Apparently that bug was fixed recently, so latest versions should be
>> pretty stable yeah.
>
> I'm using 3.8.9, so I suppose all known bugs have been fixed there (also the one with adding briks)
>
> I'll then proceed with some tests before going to production.

I am still asking myself how such bug could happen on a clustered storage software, where adding bricks is a base feature for scalable solution, like Gluster. Or maybe is it that STM releases are really under tested compared to LTM ones ? Could we states that STM release are really not made for production, or at least really risky ?

Not entirely true. The same bug existed in LTM release too.

I did try reproducing the bug on my setup as soon as Lindsay, Kevin and others started reporting about it, but it was never reproducible on my setup.
Absence of proper logging in libgfapi upon failures only made it harder to debug, even when the users successfully recreated the issue and shared
their logs. It was only after Satheesaran recreated it successfully with FUSE mount that the real debugging could begin, when fuse-bridge translator
logged the exact error code for failure.

Indeed an unreproducible bug is pretty hard to fix … thanks for the feed back. What would be the best way to find out critical bugs in different Gluster releases ? maybe browsing https://review.gluster.org/ or https://bugzilla.redhat.com, any advices ?

Cheers

Cédric

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users