Re: Replicated striped data lose

Mahdi Adnan <mahdi.adnan@xxxxxxxxxxxxxxxxx> · Sun, 13 Mar 2016 21:19:11 +0300

I couldn't find anything related to cache in the HBAs.
what logs are useful in my case ? i see only bricks logs which contains 
nothing during the failure.

###
[2016-03-13 18:05:19.728614] E [MSGID: 113022] 
[posix.c:1232:posix_mknod] 0-vmware-posix: mknod on 
/bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 
failed [File exists]
[2016-03-13 18:07:23.337086] E [MSGID: 113022] 
[posix.c:1232:posix_mknod] 0-vmware-posix: mknod on 
/bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 
failed [File exists]
[2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 
0-vmware-trash: rmdir issued on /.trashcan/, which is not permitted
[2016-03-13 18:07:55.027635] I [MSGID: 115056] 
[server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 41987: RMDIR 
/.trashcan/internal_op 
(00000000-0000-0000-0000-000000000005/internal_op) ==> (Operation not 
permitted) [Operation not permitted]
[2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 0-auth/login: 
allowed user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
[2016-03-13 18:11:34.353463] I [MSGID: 115029] 
[server-handshake.c:612:server_setvolume] 0-vmware-server: accepted 
client from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 
(version: 3.7.8)
[2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 0-auth/login: 
allowed user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4
[2016-03-13 18:11:34.591173] I [MSGID: 115029] 
[server-handshake.c:612:server_setvolume] 0-vmware-server: accepted 
client from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 
(version: 3.7.8)
###

ESXi just keeps telling me "Cannot clone T: The virtual disk is either
corrupted or not a supported format.
error
3/13/2016 9:06:20 PM
Clone virtual machine
T
VCENTER.LOCAL\Administrator
"

My setup is 2 servers with a floating ip controlled by CTDB and my ESXi 
server mount the NFS via the floating ip.

On 03/13/2016 08:40 PM, pkoelle wrote:
Am 13.03.2016 um 18:22 schrieb David Gossage:
On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan 
<mahdi.adnan@xxxxxxxxxxxxxxxxx
wrote:

My HBAs are LSISAS1068E, and the filesystem is XFS.
I tried EXT4 and it did not help.
I have created a stripted volume in one server with two bricks, same 
issue.
and i tried a replicated volume with just "sharding enabled" same 
issue,
as soon as i disable the sharding it works just fine, niether 
sharding nor
striping works for me.
i did follow up with some of threads in the mailing list and tried 
some of
the fixes that worked with the others, none worked for me. :(

Is it possible the LSI has write-cache enabled?
Why is that relevant? Even the backing filesystem has no idea if there 
is a RAID or write cache or whatever. There are blocks and sync(), end 
of story.
If you lose power and screw up your recovery OR do funky stuff with 
SAS multipathing that might be an issue with a controller cache. AFAIK 
thats not what we are talking about.

I'm afraid but unless the OP has some logs from the server, a 
reproducible testcase or a backtrace from client or server this isn't 
getting us anywhere.

cheers
Paul

On 03/13/2016 06:54 PM, David Gossage wrote:

On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan <
mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:

Okay so i have enabled shard in my test volume and it did not help,
stupidly enough, i have enabled it in a production volume
"Distributed-Replicate" and it currpted  half of my VMs.
I have updated Gluster to the latest and nothing seems to be 
changed in
my situation.
below the info of my volume;

I was pointing at the settings in that email as an example for 
corruption
fixing. I wouldn't recommend enabling sharding if you haven't gotten 
the
base working yet on that cluster. What HBA's are you using and what is
layout of filesystem for bricks?

Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/b001/vmware
Brick2: gfs002:/bricks/b004/vmware
Brick3: gfs001:/bricks/b002/vmware
Brick4: gfs002:/bricks/b005/vmware
Brick5: gfs001:/bricks/b003/vmware
Brick6: gfs002:/bricks/b006/vmware
Options Reconfigured:
performance.strict-write-ordering: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
performance.stat-prefetch: disable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
cluster.eager-lock: enable
features.shard-block-size: 16MB
features.shard: on
performance.readdir-ahead: off

On 03/12/2016 08:11 PM, David Gossage wrote:

On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan <
<mahdi.adnan@xxxxxxxxxxxxxxxxx>mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:

Both servers have HBA no RAIDs and i can setup a replicated or
dispensers without any issues.
Logs are clean and when i tried to migrate a vm and got the error,
nothing showed up in the logs.
i tried mounting the volume into my laptop and it mounted fine 
but, if i
use dd to create a data file it just hang and i cant cancel it, 
and i cant
unmount it or anything, i just have to reboot.
The same servers have another volume on other bricks in a distributed
replicas, works fine.
I have even tried the same setup in a virtual environment (created 
two
vms and install gluster and created a replicated striped) and 
again same
thing, data corruption.

I'd look through mail archives for a topic "Shard in Production" I 
think
it's called.  The shard portion may not be relevant but it does 
discuss
certain settings that had to be applied with regards to avoiding 
corruption
with VM's.  You may want to try and disable the 
performance.readdir-ahead
also.

On 03/12/2016 07:02 PM, David Gossage wrote:

On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan <
<mahdi.adnan@xxxxxxxxxxxxxxxxx>mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:

Thanks David,

My settings are all defaults, i have just created the pool and 
started
it.
I have set the settings as your recommendation and it seems to be 
the
same issue;

Type: Striped-Replicate
Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14
Status: Started
Number of Bricks: 1 x 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/t1/s
Brick2: gfs002:/bricks/t1/s
Brick3: gfs001:/bricks/t2/s
Brick4: gfs002:/bricks/t2/s
Options Reconfigured:
performance.stat-prefetch: off
network.remote-dio: on
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on

Is their a raid controller perhaps doing any caching?

In the gluster logs any errors being reported during migration 
process?
Since they aren't in use yet have you tested making just mirrored 
bricks
using different pairings of servers two at a time to see if 
problem follows
certain machine or network ports?

On 03/12/2016 03:25 PM, David Gossage wrote:

On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan <
<mahdi.adnan@xxxxxxxxxxxxxxxxx>mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:

Dears,

I have created a replicated striped volume with two bricks and two
servers but I can't use it because when I mount it in ESXi and 
try to
migrate a VM to it, the data get corrupted.
Is any one have any idea why is this happening ?

Dell 2950 x2
Seagate 15k 600GB
CentOS 7.2
Gluster 3.7.8

Appreciate your help.

Most reports of this I have seen end up being settings related.  
Post
gluster volume info. Below is what I have seen as most common 
recommended
settings.
I'd hazard a guess you may have some the read ahead cache or 
prefetch
on.

quick-read=off
read-ahead=off
io-cache=off
stat-prefetch=off
eager-lock=enable
remote-dio=on

Mahdi Adnan
System Admin

_______________________________________________
Gluster-users mailing list
<Gluster-users@xxxxxxxxxxx>Gluster-users@xxxxxxxxxxx
<http://www.gluster.org/mailman/listinfo/gluster-users>
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users