Gluster-users Digest, Vol 41, Issue 16

pranithk at gluster.com (Pranith Kumar K) · Wed, 7 Sep 2011 17:11:24 +0530

hi,
       Could you please elaborate on replication causing data loss. 
Please let us know the test case which lead you to this.

Pranith.

On 09/07/2011 04:01 PM, J?rgen Winkler wrote:
> Hi Phil,
>
> we?d the same Problem, try to compile with debug options.
> Yes this sounds strange but it help?s when u are using SLES, the 
> glusterd works ok and u can start to work with it.
>
> just put
>
> exportCFLAGS='-g3 -O0'
>
> between %build and %configure in the glusterfs spec file.
>
>
>
> But be warned don?t use it with important data especially when u are 
> planing to use the replication feature, this will cause in data loss  
> sooner or later.
>
> Cheers !
>
>
>
>
>
> Am 07.09.2011 11:21, schrieb gluster-users-request at gluster.org:
>> Send Gluster-users mailing list submissions to
>>     gluster-users at gluster.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>     http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> or, via email, send a message with subject or body 'help' to
>>     gluster-users-request at gluster.org
>>
>> You can reach the person managing the list at
>>     gluster-users-owner at gluster.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Gluster-users digest..."
>>
>>
>> Today's Topics:
>>
>>     1. Re: Reading directly from brick (Reinis Rozitis)
>>     2. Re: NFS secondary groups not working. (Di Pe)
>>     3. Inconsistent md5sum of replicated file (Anthony Delviscio)
>>     4. Re: Inconsistent md5sum of replicated file (Pranith Kumar K)
>>     5. Problems with SLES 11 (Phil Bayfield)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Tue, 6 Sep 2011 23:24:24 +0300
>> From: "Reinis Rozitis"<r at roze.lv>
>> Subject: Re: Reading directly from brick
>> To:<gluster-users at gluster.org>
>> Message-ID:<F7DAC991835C44889BCDB281F977B692 at NeiRoze>
>> Content-Type: text/plain; format=flowed; charset="utf-8";
>>     reply-type=original
>>
>>> Simple answer - no, it's not ever safe to do writes to an active 
>>> Gluster
>>> backend.
>> Question was about reads though and then the answer is it is 
>> perfectly fine
>> (and faster) to do reads directly from the filesystem (in replicated 
>> setups)
>> if you keep in mind that by doing so you lose the Glusters autoheal
>> eature  - eg if one of the gluster nodes goes down and there is a file
>> written meanwhile when the server comes up if you access the file 
>> directly
>> it won't show up while it would when accessing it via the gluster mount
>> point (you can work arround it by manually triggering the self heal).
>>
>>
>>> I've heard that reads from glusterfs are around 20 times slower than 
>>> from
>>> ext3:
>> "20 times" might be fetched out of thin air but of course there is a
>> significant overhead of serving a file from a gluster which basically
>> involves network operations and additional meta data checks versus 
>> fetching
>> the file directly from iron.
>>
>>
>> rr
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Tue, 6 Sep 2011 14:46:28 -0700
>> From: Di Pe<dipeit at gmail.com>
>> Subject: Re: NFS secondary groups not working.
>> To: gluster-users at gluster.org
>> Message-ID:
>> <CAB9T+o+fAb+YasVxMsUsVmMw0Scp3BLSqc0Y_grusRmV11qejg at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Anand, has this issue been confirmed by gluster and is it in the pipe
>> to get fixed or do you need .additional information? We are no gluster
>> experts but are happy to help if we know who to provide additional
>> debugging info.
>>
>> On Mon, Aug 29, 2011 at 9:44 AM, Mike Hanby<mhanby at uab.edu>  wrote:
>>> I just noticed the problem happening on one client in our 
>>> environment (clients and servers running 3.2.2), other clients work 
>>> fine.
>>>
>>> The clients and servers are all CentOS 5.6 x86_64
>>>
>>> I get the same permission denied using Gluster FUSE and Gluster NFS 
>>> mounts on this client.
>>>
>>> I'm not mounting it with ACL.
>>>
>>> The volume is a simple distributed volume with two servers.
>>>
>>>> -----Original Message-----
>>>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-
>>>> bounces at gluster.org] On Behalf Of Hubert-Jan Schaminee
>>>> Sent: Saturday, August 27, 2011 10:10 AM
>>>> To: Anand Avati
>>>> Cc: gluster-users at gluster.org
>>>> Subject: Re: NFS secondary groups not working.
>>>>
>>>> Op zaterdag 13-08-2011 om 20:22 uur [tijdzone +0530], schreef Anand
>>>> Avati:
>>>>>
>>>>> On Sat, Aug 13, 2011 at 5:29 PM, Dipeit<dipeit at gmail.com>  wrote:
>>>>> ? ? ? ? We noticed this bug too using the gluster client. I'm
>>>>> ? ? ? ? surprised that not more people noticed this lack of posix
>>>>> ? ? ? ? compliance. This makes gluster really unusable in multiuser
>>>>> ? ? ? ? environments. Is that because gluster is mostly used in large
>>>>> ? ? ? ? web farms like pandora?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> GlusterFS is POSIX compliant w.r.t user groups. We have not seen this
>>>>> issue in our testing. Can you give more info about your setup? Have
>>>>> you mounted with -o acl or without? Anything unusual in the logs?
>>>>>
>>>>>
>>>>> Avati
>>>> I'm having the same problem here.
>>>>
>>>> I use the latest version (3.2.3 build on Aug 23 2011 19:54:51 of the
>>>> download site) on a Centos 5.6 as a gluster servers, Debian squeeze
>>>> (same version) as client.
>>>> I'm refused access to files and directories despite having correct
>>>> group permissions.
>>>>
>>>> So I installed a clean Centos client (also latest version) for a test
>>>> and everything is working perfectly .... ?
>>>>
>>>> The used Debian (squeeze) and Centos are 64 bits (repository from
>>>> gluster.com).
>>>> Using Debian testing (64 and 32 bits) and gluster from the Debian
>>>> repository also denies me access in 64 and 32 bits version.
>>>>
>>>> I assume the mixed environment explains why this bug is rare.
>>>>
>>>> The used gluster installation is a basic replicated setup one with two
>>>> servers like described in the de Gluster docs.
>>>>
>>>>
>>>> Hubert-Jan Schamin?e
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 6 Sep 2011 18:52:52 -0400
>> From: Anthony Delviscio<adelviscio at gmail.com>
>> Subject: Inconsistent md5sum of replicated file
>> To: gluster-users at gluster.org
>> Message-ID:
>> <CAKE0inQy3Tjf3TB11kc+F_F-P7kN2CJ+eG+2FaRUxOe4tnzgwQ at mail.gmail.com>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> I was wondering if anyone would be able to shed some light on how a file
>> could end up with inconsistent md5sums on Gluster backend storage.
>>
>>
>>
>> Our configuration is running on Gluster v3.1.5 in a distribute-replicate
>> setup consisting of 8 bricks.
>>
>> Our OS is Red Hat 5.6 x86_64.  Backend storage is an ext3 RAID 5.
>>
>>
>>
>> The 8 bricks are in RR DNS and are mounted for reading/writing via NFS
>> automounts.
>>
>>
>>
>> When comparing md5sums of the file from two different NFS clients, 
>> they were
>> different.
>>
>>
>>
>> The extended attributes of the files on backend storage are 
>> identical.  The
>> file size and permissions are identical.  The stat data (excluding 
>> inode on
>> backend storage file system) is identical.
>>
>> However, running md5sum on the two files, results in two different 
>> md5sums.
>>
>>
>>
>> Copying both files to another location/server and running the md5sum 
>> also
>> results in no change ? they?re still different.
>>
>>
>>
>> Gluster logs do not show anything related to the filename in question.
>>   Triggering
>> a self-healing operation didn?t seem to do anything and it may have 
>> to do
>> with the fact that the extended attributes are identical.
>>
>>
>>
>> If more information is required, let me know and I will try to 
>> accommodate.
>>
>>
>> Thank you
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:<http://gluster.org/pipermail/gluster-users/attachments/20110906/4628faa2/attachment-0001.htm> 
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Wed, 7 Sep 2011 14:13:56 +0530
>> From: Pranith Kumar K<pranithk at gluster.com>
>> Subject: Re: Inconsistent md5sum of replicated file
>> To: Anthony Delviscio<adelviscio at gmail.com>
>> Cc: gluster-users at gluster.org
>> Message-ID:<4E672ECC.7050703 at gluster.com>
>> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>>
>> hi Anthony,
>>         Could you send the output of the getfattr -d -m . -e hex
>> <filepath>  on both the bricks and also the stat output on the both the
>> backends. Give the outputs for its parent directory also.
>>
>> Pranith.
>>
>> On 09/07/2011 04:22 AM, Anthony Delviscio wrote:
>>> I was wondering if anyone would be able to shed some light on how a
>>> file could end up with inconsistent md5sums on Gluster backend storage.
>>>
>>> Our configuration is running on Gluster v3.1.5 in a
>>> distribute-replicate setup consisting of 8 bricks.
>>>
>>> Our OS is Red Hat 5.6 x86_64.Backend storage is an ext3 RAID 5.
>>>
>>> The 8 bricks are in RR DNS and are mounted for reading/writing via NFS
>>> automounts.
>>>
>>> When comparing md5sums of the file from two different NFS clients,
>>> they were different.
>>>
>>> The extended attributes of the files on backend storage are
>>> identical.The file size and permissions are identical.The stat data
>>> (excluding inode on backend storage file system) is identical.
>>>
>>> However, running md5sum on the two files, results in two different
>>> md5sums.
>>>
>>> Copying both files to another location/server and running the md5sum
>>> also results in no change ? they?re still different.
>>>
>>> Gluster logs do not show anything related to the filename in
>>> question.Triggering a self-healing operation didn?t seem to do
>>> anything and it may have to do with the fact that the extended
>>> attributes are identical.
>>>
>>> If more information is required, let me know and I will try to
>>> accommodate.
>>>
>>> Thank you
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:<http://gluster.org/pipermail/gluster-users/attachments/20110907/86d14cab/attachment-0001.htm> 
>>
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Wed, 7 Sep 2011 10:15:43 +0100
>> From: Phil Bayfield<phil at techlightenment.com>
>> Subject: Problems with SLES 11
>> To: gluster-users at gluster.org
>> Message-ID:
>> <CAFXH-fW0DBE9YomJzAtvdFAWaf5Zpq-TfbfTPb+K7gBu-R+06Q at mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi there,
>>
>> I compiled and installed the latest version of Gluster on a couple of 
>> SLES
>> 11 SP1 boxes, everything up to this point seemed ok.
>>
>> I start the daemon on both boxes, and both are listening on 24007.
>>
>> I issue a "gluster peer probe"  command on one of the boxes and the 
>> daemon
>> instantly dies, I restart it and it shows:
>>
>> # gluster peer status
>> Number of Peers: 1
>>
>> Hostname: mckalcpap02
>> Uuid: 00000000-0000-0000-0000-000000000000
>> State: Establishing Connection (Connected)
>>
>> I attempted to run the probe on the other box, the daemon crashes, 
>> now as I
>> start the daemon on each box the daemon just crashes on the other box.
>>
>> The log output immediately prior to the crash is as follows:
>>
>> [2011-06-07 08:05:10.700710] I
>> [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: 
>> Received CLI
>> probe req mckalcpap02 24007
>> [2011-06-07 08:05:10.701058] I 
>> [glusterd-handler.c:391:glusterd_friend_find]
>> 0-glusterd: Unable to find hostname: mckalcpap02
>> [2011-06-07 08:05:10.701086] I
>> [glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to 
>> find
>> peerinfo for host: mckalcpap02 (24007)
>> [2011-06-07 08:05:10.702832] I 
>> [glusterd-handler.c:3404:glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> [2011-06-07 08:05:10.703110] I
>> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using 
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>>
>> If I use the IP address the same thing happens:
>>
>> [2011-06-07 08:07:12.873075] I
>> [glusterd-handler.c:623:glusterd_handle_cli_probe] 0-glusterd: 
>> Received CLI
>> probe req 10.9.54.2 24007
>> [2011-06-07 08:07:12.873410] I 
>> [glusterd-handler.c:391:glusterd_friend_find]
>> 0-glusterd: Unable to find hostname: 10.9.54.2
>> [2011-06-07 08:07:12.873438] I
>> [glusterd-handler.c:3422:glusterd_probe_begin] 0-glusterd: Unable to 
>> find
>> peerinfo for host: 10.9.54.2 (24007)
>> [2011-06-07 08:07:12.875046] I 
>> [glusterd-handler.c:3404:glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> [2011-06-07 08:07:12.875280] I
>> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using 
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>>
>> There is no firewall issue:
>>
>> # telnet mckalcpap02 24007
>> Trying 10.9.54.2...
>> Connected to mckalcpap02.
>> Escape character is '^]'.
>>
>> Following restart (which crashes the other node) the log output is as
>> follows:
>>
>> [2011-06-07 08:10:09.616486] I [glusterd.c:564:init] 0-management: Using
>> /etc/glusterd as working directory
>> [2011-06-07 08:10:09.617619] C [rdma.c:3933:rdma_init] 
>> 0-rpc-transport/rdma:
>> Failed to get IB devices
>> [2011-06-07 08:10:09.617676] E [rdma.c:4812:init] 0-rdma.management: 
>> Failed
>> to initialize IB Device
>> [2011-06-07 08:10:09.617700] E [rpc-transport.c:741:rpc_transport_load]
>> 0-rpc-transport: 'rdma' initialization failed
>> [2011-06-07 08:10:09.617724] W [rpcsvc.c:1288:rpcsvc_transport_create]
>> 0-rpc-service: cannot create listener, initing the transport failed
>> [2011-06-07 08:10:09.617830] I [glusterd.c:88:glusterd_uuid_init]
>> 0-glusterd: retrieved UUID: 1e344f5d-6904-4d14-9be2-8f0f44b97dd7
>> [2011-06-07 08:10:11.258098] I 
>> [glusterd-handler.c:3404:glusterd_friend_add]
>> 0-glusterd: connect returned 0
>> Given volfile:
>> +------------------------------------------------------------------------------+ 
>>
>>    1: volume management
>>    2:     type mgmt/glusterd
>>    3:     option working-directory /etc/glusterd
>>    4:     option transport-type socket,rdma
>>    5:     option transport.socket.keepalive-time 10
>>    6:     option transport.socket.keepalive-interval 2
>>    7: end-volume
>>    8:
>>
>> +------------------------------------------------------------------------------+ 
>>
>> [2011-06-07 08:10:11.258431] I
>> [glusterd-handshake.c:317:glusterd_set_clnt_mgmt_program] 0-: Using 
>> Program
>> glusterd clnt mgmt, Num (1238433), Version (1)
>> [2011-06-07 08:10:11.280533] W 
>> [socket.c:1494:__socket_proto_state_machine]
>> 0-socket.management: reading from socket failed. Error (Transport 
>> endpoint
>> is not connected), peer (10.9.54.2:1023)
>> [2011-06-07 08:10:11.280595] W 
>> [socket.c:1494:__socket_proto_state_machine]
>> 0-management: reading from socket failed. Error (Transport endpoint 
>> is not
>> connected), peer (10.9.54.2:24007)
>> [2011-06-07 08:10:17.256235] E [socket.c:1685:socket_connect_finish]
>> 0-management: connection to 10.9.54.2:24007 failed (Connection refused)
>>
>> There are no logs on the node which crashes.
>>
>> I've tried various possibly solutions from searching the net but got 
>> getting
>> anywhere, can anyone advise how to proceed?
>>
>> Thanks,
>> Phil.
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users