Re: 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

Joe Julian <joe@xxxxxxxxxxxxxxxx> · Mon, 14 Nov 2016 10:43:42 -0800

Features and stability are not mutually exclusive. 

Sometimes instability is cured by adding a feature. 

Fixing a bug is not something that's solved better by having more developers work on it.

Sometimes fixing one bug exposed a problem elsewhere. 

Using free open source community projects with your own hardware and system design weights the responsibility to test more heavily on yourself. If that's not a risk you can afford, you might consider contracting with a 3rd party which has "certified" installation parameters. IMHO.

On November 14, 2016 8:29:00 AM PST, Gandalf Corvotempesta <gandalf.corvotempesta@xxxxxxxxx> wrote:
1016-11-14 17:01 GMT+01:00 Vijay Bellur <vbellur@xxxxxxxxxx>:
 Accessing sharded data after disabling sharding is something that we
 did not visualize as a valid use case at any point in time. Also, you
 could access the contents by enabling sharding again. Given these
 factors I think this particular problem has not been prioritized by
 us.

That's not true.
If you have VMs running on a sharded volume and you disable sharding,
with the VM still running, everything crash and could lead to data loss, as VM
will be unable to find their filesystem and so on, qemu currupts the
image and so on.....

If I write to a file that was shareded, (in example a log file), now
when you disable the shard,
the application would write the existing file (the one that was
the
first shard).
If you reenable sharding, you lost some data

Example:

128MB file. shard set to 64MB. You have 2 chunks: shard1+shard2

Now you are writing to the file:

AAAA
BBBB
CCCC
DDDD

AAAA+BBBB are placed on shard1, CCCC+DDDD are placed on shard2

If you disable the shard and write some extra data, EEEE, then EEEE
would be placed after BBBB in shard1 (growing more than 64MB)
and not on shard3

If you re-enable shard, EEEE is lost, as gluster would expect it as
shard3. and I think gluster will read only the first 64MB from shard1.
If gluster read the whole file, you'll get something like this:

AAAA
BBBB
EEEE
CCCC
DDDD

in a text file this is bad, in a VM image, this mean data
loss/corruption almost impossible to fix.

 As with many other projects, we are in a stage today where the number
 of users and testers far outweigh the number of developers
 contributing code. With this state it becomes hard to prioritize
 problems from a long todo list for developers.  If valuable community
 members like you feel strongly about a bug or feature that need
 attention of developers, please call such issues out on the mailing
 list. We will be more than happy to help.

That's why i've asked for less feature and more stability.
If you have to prioritize, please choose all bugs that could lead to
data corruption or similiar.

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

-- 

Sent from my Android device with K-9 Mail. Please excuse my brevity._______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users