Hmm ok. Could you share the nfs.log content?
-KrutikaOn Tue, Mar 15, 2016 at 1:45 PM, Mahdi Adnan <mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:
Okay, here's what i did;
Volume Name: v
Type: Distributed-Replicate
Volume ID: b348fd8e-b117-469d-bcc0-56a56bdfc930
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/b001/v
Brick2: gfs001:/bricks/b002/v
Brick3: gfs001:/bricks/b003/v
Brick4: gfs002:/bricks/b004/v
Brick5: gfs002:/bricks/b005/v
Brick6: gfs002:/bricks/b006/v
Options Reconfigured:
features.shard-block-size: 128MB
features.shard: enable
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
same error.
and still mounting using glusterfs will work just fine.
On 03/15/2016 11:04 AM, Krutika Dhananjay wrote:
-KrutikaOK but what if you use it with replication? Do you still see the error? I think not.Could you give it a try and tell me what you find?
On Tue, Mar 15, 2016 at 1:23 PM, Mahdi Adnan <mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:
Hi,
I have created the following volume;
Volume Name: v
Type: Distribute
Volume ID: 90de6430-7f83-4eda-a98f-ad1fabcf1043
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/b001/v
Brick2: gfs001:/bricks/b002/v
Brick3: gfs001:/bricks/b003/v
Options Reconfigured:
features.shard-block-size: 128MB
features.shard: enable
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: on
and after mounting it in ESXi and trying to clone a VM to it, i got the same error.
Respectfully
Mahdi A. Mahdi
On 03/15/2016 10:44 AM, Krutika Dhananjay wrote:
-KrutikaIf you run into any issues even after you do this, let us know and we'll help you out.I would suggest that you create an nx3 volume where n is the number of distribute subvols you prefer, enable group virt options on it, and enable sharding on it,c) Like Niels said, stripe feature is virtually deprecated.b) Nobody tested it.a) It is not recommended and there is no point in using both. Using sharding alone on your volume should work fine.Hi,Do not use sharding and stripe together in the same volume because
set the shard-block-size that you feel appropriate and then just start off with VM image creation etc.
On Tue, Mar 15, 2016 at 1:07 PM, Mahdi Adnan <mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:
Thanks Krutika,
I have deleted the volume and created a new one.
I found that it may be an issue with the NFS itself, i have created a new striped volume and enabled sharding and mounted it via glusterfs and it worked just fine, if i mount it with nfs it will fail and gives me the same errors.
Respectfully
Mahdi A. Mahdi
On 03/15/2016 06:24 AM, Krutika Dhananjay wrote:
-KrutikaAlso, it is normally advised to use a replica 3 volume as opposed to replica 2 volume to guard against split-brains.# getfattr -d -m . -e hex /mnt/b1/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c on the first node andHere's what you need to execute:Hi,So could you share the xattrs associated with the file at <BRICK_PATH>/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
# getfattr -d -m . -e hex /mnt/b2/v/.glusterfs/c3/e8/c3e88cc1-7e0a-4d46-9685-2d12131a5e1c on the second.
On Mon, Mar 14, 2016 at 3:17 PM, Mahdi Adnan <mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote:
sorry for serial posting but, i got new logs it might help..
the message appear during the migration;
/var/log/glusterfs/nfs.log
[2016-03-14 09:45:04.573765] I [MSGID: 109036] [dht-common.c:8043:dht_log_new_layout_for_dir_selfheal] 0-testv-dht: Setting layout of /New Virtual Machine_1 with [Subvol_name: testv-stripe-0, Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
[2016-03-14 09:45:04.957499] E [shard.c:369:shard_modify_size_and_block_count] (-->/usr/lib64/glusterfs/3.7.8/xlator/cluster/distribute.so(dht_file_setattr_cbk+0x14f) [0x7f27a13c067f] -->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_common_setattr_cbk+0xcc) [0x7f27a116681c] -->/usr/lib64/glusterfs/3.7.8/xlator/features/shard.so(shard_modify_size_and_block_count+0xdd) [0x7f27a116584d] ) 0-testv-shard: Failed to get trusted.glusterfs.shard.file-size for c3e88cc1-7e0a-4d46-9685-2d12131a5e1c
[2016-03-14 09:45:04.957577] W [MSGID: 112199] [nfs3-helpers.c:3418:nfs3_log_common_res] 0-nfs-nfsv3: /New Virtual Machine_1/New Virtual Machine-flat.vmdk => (XID: 3fec5a26, SETATTR: NFS: 22(Invalid argument for operation), POSIX: 22(Invalid argument)) [Invalid argument]
[2016-03-14 09:45:05.079657] E [MSGID: 112069] [nfs3.c:3649:nfs3_rmdir_resume] 0-nfs-nfsv3: No such file or directory: (192.168.221.52:826) testv : 00000000-0000-0000-0000-000000000001
Respectfully
Mahdi A. Mahd
On 03/14/2016 11:14 AM, Mahdi Adnan wrote:
So i have deployed a new server "Cisco UCS C220M4" and created a new volume;
Volume Name: testv
Type: Stripe
Volume ID: 55cdac79-fe87-4f1f-90c0-15c9100fe00b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.70.0.250:/mnt/b1/v
Brick2: 10.70.0.250:/mnt/b2/v
Options Reconfigured:
nfs.disable: off
features.shard-block-size: 64MB
features.shard: enable
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: off
same error ..
can anyone share with me the info of a working striped volume ?
On 03/14/2016 09:02 AM, Mahdi Adnan wrote:
I have a pool of two bricks in the same server;
Volume Name: k
Type: Stripe
Volume ID: 1e9281ce-2a8b-44e8-a0c6-e3ebf7416b2b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gfs001:/bricks/t1/k
Brick2: gfs001:/bricks/t2/k
Options Reconfigured:
features.shard-block-size: 64MB
features.shard: on
cluster.server-quorum-type: server
cluster.quorum-type: auto
network.remote-dio: enable
cluster.eager-lock: enable
performance.stat-prefetch: off
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
performance.readdir-ahead: off
same issue ...
glusterfs 3.7.8 built on Mar 10 2016 20:20:45.
Respectfully
Mahdi A. Mahdi
Systems Administrator
IT. Department
Earthlink Telecommunications
Cell: 07903316180
Work: 3352
Skype: mahdi.adnan@xxxxxxxxxxxOn 03/14/2016 08:11 AM, Niels de Vos wrote:
On Mon, Mar 14, 2016 at 08:12:27AM +0530, Krutika Dhananjay wrote:It would be better to use sharding over stripe for your vm use case. It offers better distribution and utilisation of bricks and better heal performance. And it is well tested.Basically the "striping" feature is deprecated, "sharding" is its improved replacement. I expect to see "striping" completely dropped in the next major release. NielsCouple of things to note before you do that: 1. Most of the bug fixes in sharding have gone into 3.7.8. So it is advised that you use 3.7.8 or above. 2. When you enable sharding on a volume, already existing files in the volume do not get sharded. Only the files that are newly created from the time sharding is enabled will. If you do want to shard the existing files, then you would need to cp them to a temp name within the volume, and then rename them back to the original file name. HTH, Krutika On Sun, Mar 13, 2016 at 11:49 PM, Mahdi Adnan <mahdi.adnan@xxxxxxxxxxxxxxxxxwrote:I couldn't find anything related to cache in the HBAs. what logs are useful in my case ? i see only bricks logs which contains nothing during the failure. ### [2016-03-13 18:05:19.728614] E [MSGID: 113022] [posix.c:1232:posix_mknod] 0-vmware-posix: mknod on /bricks/b003/vmware/.shard/17d75e20-16f1-405e-9fa5-99ee7b1bd7f1.511 failed [File exists] [2016-03-13 18:07:23.337086] E [MSGID: 113022] [posix.c:1232:posix_mknod] 0-vmware-posix: mknod on /bricks/b003/vmware/.shard/eef2d538-8eee-4e58-bc88-fbf7dc03b263.4095 failed [File exists] [2016-03-13 18:07:55.027600] W [trash.c:1922:trash_rmdir] 0-vmware-trash: rmdir issued on /.trashcan/, which is not permitted [2016-03-13 18:07:55.027635] I [MSGID: 115056] [server-rpc-fops.c:459:server_rmdir_cbk] 0-vmware-server: 41987: RMDIR /.trashcan/internal_op (00000000-0000-0000-0000-000000000005/internal_op) ==> (Operation not permitted) [Operation not permitted] [2016-03-13 18:11:34.353441] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4 [2016-03-13 18:11:34.353463] I [MSGID: 115029] [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client from gfs002-2727-2016/03/13-20:17:43:613597-vmware-client-4-0-0 (version: 3.7.8) [2016-03-13 18:11:34.591139] I [login.c:81:gf_auth] 0-auth/login: allowed user names: c0c72c37-477a-49a5-a305-3372c1c2f2b4 [2016-03-13 18:11:34.591173] I [MSGID: 115029] [server-handshake.c:612:server_setvolume] 0-vmware-server: accepted client from gfs002-2719-2016/03/13-20:17:42:609388-vmware-client-4-0-0 (version: 3.7.8) ### ESXi just keeps telling me "Cannot clone T: The virtual disk is either corrupted or not a supported format. error 3/13/2016 9:06:20 PM Clone virtual machine T VCENTER.LOCAL\Administrator " My setup is 2 servers with a floating ip controlled by CTDB and my ESXi server mount the NFS via the floating ip. On 03/13/2016 08:40 PM, pkoelle wrote:Am 13.03.2016 um 18:22 schrieb David Gossage:On Sun, Mar 13, 2016 at 11:07 AM, Mahdi Adnan < mahdi.adnan@xxxxxxxxxxxxxxxxxwrote:My HBAs are LSISAS1068E, and the filesystem is XFS.I tried EXT4 and it did not help. I have created a stripted volume in one server with two bricks, same issue. and i tried a replicated volume with just "sharding enabled" same issue, as soon as i disable the sharding it works just fine, niether sharding nor striping works for me. i did follow up with some of threads in the mailing list and tried some of the fixes that worked with the others, none worked for me. :(Is it possible the LSI has write-cache enabled?Why is that relevant? Even the backing filesystem has no idea if there is a RAID or write cache or whatever. There are blocks and sync(), end of story. If you lose power and screw up your recovery OR do funky stuff with SAS multipathing that might be an issue with a controller cache. AFAIK thats not what we are talking about. I'm afraid but unless the OP has some logs from the server, a reproducible testcase or a backtrace from client or server this isn't getting us anywhere. cheers PaulOn 03/13/2016 06:54 PM, David Gossage wrote:On Sun, Mar 13, 2016 at 8:16 AM, Mahdi Adnan < mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote: Okay so i have enabled shard in my test volume and it did not help,stupidly enough, i have enabled it in a production volume "Distributed-Replicate" and it currpted half of my VMs. I have updated Gluster to the latest and nothing seems to be changed in my situation. below the info of my volume;I was pointing at the settings in that email as an example for corruption fixing. I wouldn't recommend enabling sharding if you haven't gotten the base working yet on that cluster. What HBA's are you using and what is layout of filesystem for bricks? Number of Bricks: 3 x 2 = 6Transport-type: tcp Bricks: Brick1: gfs001:/bricks/b001/vmware Brick2: gfs002:/bricks/b004/vmware Brick3: gfs001:/bricks/b002/vmware Brick4: gfs002:/bricks/b005/vmware Brick5: gfs001:/bricks/b003/vmware Brick6: gfs002:/bricks/b006/vmware Options Reconfigured: performance.strict-write-ordering: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: enable performance.stat-prefetch: disable performance.io-cache: off performance.read-ahead: off performance.quick-read: off cluster.eager-lock: enable features.shard-block-size: 16MB features.shard: on performance.readdir-ahead: off On 03/12/2016 08:11 PM, David Gossage wrote: On Sat, Mar 12, 2016 at 10:21 AM, Mahdi Adnan < <mahdi.adnan@xxxxxxxxxxxxxxxxx>mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote: Both servers have HBA no RAIDs and i can setup a replicated ordispensers without any issues. Logs are clean and when i tried to migrate a vm and got the error, nothing showed up in the logs. i tried mounting the volume into my laptop and it mounted fine but, if i use dd to create a data file it just hang and i cant cancel it, and i cant unmount it or anything, i just have to reboot. The same servers have another volume on other bricks in a distributed replicas, works fine. I have even tried the same setup in a virtual environment (created two vms and install gluster and created a replicated striped) and again same thing, data corruption.I'd look through mail archives for a topic "Shard in Production" I think it's called. The shard portion may not be relevant but it does discuss certain settings that had to be applied with regards to avoiding corruption with VM's. You may want to try and disable the performance.readdir-ahead also.On 03/12/2016 07:02 PM, David Gossage wrote: On Sat, Mar 12, 2016 at 9:51 AM, Mahdi Adnan < <mahdi.adnan@xxxxxxxxxxxxxxxxx>mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote: Thanks David,My settings are all defaults, i have just created the pool and started it. I have set the settings as your recommendation and it seems to be the same issue; Type: Striped-Replicate Volume ID: 44adfd8c-2ed1-4aa5-b256-d12b64f7fc14 Status: Started Number of Bricks: 1 x 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gfs001:/bricks/t1/s Brick2: gfs002:/bricks/t1/s Brick3: gfs001:/bricks/t2/s Brick4: gfs002:/bricks/t2/s Options Reconfigured: performance.stat-prefetch: off network.remote-dio: on cluster.eager-lock: enable performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.readdir-ahead: onIs their a raid controller perhaps doing any caching? In the gluster logs any errors being reported during migration process? Since they aren't in use yet have you tested making just mirrored bricks using different pairings of servers two at a time to see if problem follows certain machine or network ports?On 03/12/2016 03:25 PM, David Gossage wrote: On Sat, Mar 12, 2016 at 1:55 AM, Mahdi Adnan < <mahdi.adnan@xxxxxxxxxxxxxxxxx>mahdi.adnan@xxxxxxxxxxxxxxxxx> wrote: Dears,I have created a replicated striped volume with two bricks and two servers but I can't use it because when I mount it in ESXi and try to migrate a VM to it, the data get corrupted. Is any one have any idea why is this happening ? Dell 2950 x2 Seagate 15k 600GB CentOS 7.2 Gluster 3.7.8 Appreciate your help.Most reports of this I have seen end up being settings related. Post gluster volume info. Below is what I have seen as most common recommended settings. I'd hazard a guess you may have some the read ahead cache or prefetch on. quick-read=off read-ahead=off io-cache=off stat-prefetch=off eager-lock=enable remote-dio=onMahdi Adnan System Admin _______________________________________________ Gluster-users mailing list <Gluster-users@xxxxxxxxxxx>Gluster-users@xxxxxxxxxxx <http://www.gluster.org/mailman/listinfo/gluster-users> http://www.gluster.org/mailman/listinfo/gluster-users_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users