I am seeing some oddities with the replication/distribute translators however. I have three partitions on each gluster server exporting three bricks - We have two servers. The gluster clients replicates each brick between the two servers and then i have a distribute translator for all the replicated bricks - basically gluster raid10.
There are a handful of files which have been copied into the gluster volume but since have disappeared, however the physical files exist on both bricks.
(from a client)
[root@client1 049891002526]# pwd
/intstore/data/tracks/tmg/2008_02_05/049891002526
[root@client1 049891002526]# ls -al 049891002526_01_09.wma.sigKey01.k
ls: 049891002526_01_09.wma.sigKey01.k: No such file or directory
[root@client1 049891002526]# head 049891002526_01_09.wma.sigKey01.k
head: cannot open `049891002526_01_09.wma.sigKey01.k' for reading: No such file or directory
[root@client1 049891002526]#
(from a server brick)
[root@server1 049891002526]# pwd
/intstore/intstore01c/gcdata/data/tracks/tmg/2008_02_05/049891002526
[root@server1 049891002526]# ls -al 049891002526_01_09.wma.sigKey01.k
-rw-rw-rw- 1 10015 root 19377712 Feb 6 2008 049891002526_01_09.wma.sigKey01.k
[root@server1 049891002526]# attr -l 049891002526_01_09.wma.sigKey01.k
Attribute "glusterfs.createtime" has a 10 byte value for 049891002526_01_09.wma.sigKey01.k
Attribute "glusterfs.version" has a 1 byte value for 049891002526_01_09.wma.sigKey01.k
Attribute "selinux" has a 24 byte value for 049891002526_01_09.wma.sigKey01.k
[root@server1 049891002526]# attr -l .
Attribute "glusterfs.createtime" has a 10 byte value for .
Attribute "glusterfs.version" has a 1 byte value for .
Attribute "glusterfs.dht" has a 16 byte value for .
Attribute "selinux" has a 24 byte value for .
Nothing in both the client and server logs. I've tried all the normal replication checks and self-heal such as ls -alR. If i copy the file back from one of the bricks into the volume it will show up again however it has a 1/3 chance of getting written to the files original location. So then i end up with two identical files on two different bricks.
This volume has over 40 million files and directories so it can be very tedious to find anomalies. I wrote a quick perl script to search 1/25 of our total files in the volume for missing files and md5 checksum differences and as of now its about 15% (138,500 files) complete and has found ~7000 missing files and 0 md5 checksum differences.
How could i debug this? I'd image it has something to do with the extended attributes on either the file or parent directory...but as far as i can tell that all looks fine.
thanks,
liam
client glusterfs.vol:
volume brick1a
type protocol/client
option transport-type tcp
option remote-host server1
option remote-subvolume brick1a
end-volume
volume brick1b
type protocol/client
option transport-type tcp
option remote-host server1
option remote-subvolume brick1b
end-volume
volume brick1c
type protocol/client
option transport-type tcp
option remote-host server1
option remote-subvolume brick1c
end-volume
volume brick2a
type protocol/client
option transport-type tcp
option remote-host server2
option remote-subvolume brick2a
end-volume
volume brick2b
type protocol/client
option transport-type tcp
option remote-host server2
option remote-subvolume brick2b
end-volume
volume brick2c
type protocol/client
option transport-type tcp
option remote-host server2
option remote-subvolume brick2c
end-volume
volume bricks1
type cluster/replicate
subvolumes brick1a brick2a
end-volume
volume bricks2
type cluster/replicate
subvolumes brick1b brick2b
end-volume
volume bricks3
type cluster/replicate
subvolumes brick1c brick2c
end-volume
volume distribute
type cluster/distribute
subvolumes bricks1 bricks2 bricks3
end-volume
volume writebehind
type performance/write-behind
option block-size 1MB
option cache-size 64MB
option flush-behind on
subvolumes distribute
end-volume
volume cache
type performance/io-cache
option cache-size 2048MB
subvolumes writebehind
end-volume
server glusterfsd.vol:
volume intstore01a
type storage/posix
option directory /intstore/intstore01a/gcdata
end-volume
volume intstore01b
type storage/posix
option directory /intstore/intstore01b/gcdata
end-volume
volume intstore01c
type storage/posix
option directory /intstore/intstore01c/gcdata
end-volume
volume locksa
type features/posix-locks
option mandatory-locks on
subvolumes intstore01a
end-volume
volume locksb
type features/posix-locks
option mandatory-locks on
subvolumes intstore01b
end-volume
volume locksc
type features/posix-locks
option mandatory-locks on
subvolumes intstore01c
end-volume
volume brick1a
type performance/io-threads
option thread-count 32
subvolumes locksa
end-volume
volume brick1b
type performance/io-threads
option thread-count 32
subvolumes locksb
end-volume
volume brick1c
type performance/io-threads
option thread-count 32
subvolumes locksc
end-volume
volume server
type protocol/server
option transport-type tcp
option auth.addr.brick1a.allow 192.168.12.*
option auth.addr.brick1b.allow 192.168.12.*
option auth.addr.brick1c.allow 192.168.12.*
subvolumes brick1a brick1b brick1c
end-volume
On Wed, Apr 22, 2009 at 5:43 PM, Liam Slusser <lslusser@xxxxxxxxx> wrote:
Avati,Big thanks. Looks like that did the trick. I'll report back in the morning if anything has changed but its looking MUCH better. Thanks again!liamOn Wed, Apr 22, 2009 at 2:32 PM, Anand Avati <avati@xxxxxxxxxxx> wrote:Liam,
An fd leak and a lock structure leak has been fixed in the git
repository, which explains a leak in the first subvolume's server.
Please pull the latest patches and let us know if it does not fixe
your issues. Thanks!
Avati
On Tue, Apr 21, 2009 at 3:41 PM, Liam Slusser <lslusser@xxxxxxxxx> wrote:
> There is still a memory leak with rc8 on my setup. The first server in a
> cluster or two servers starts out using 18M and just slowly increases.
> After 30mins it has doubled in size to over 30M and just keeps growing -
> the more memory it uses the worst the performance. Funny that the second
> server in my cluster using the same configuration file has no such memory
> problem.
> My glusterfsd.vol has no performance translators, just 3 storage/posix -> 3
> features/posix-locks -> protocol/server.
> thanks,
> liam
> On Mon, Apr 20, 2009 at 2:01 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
>>
>> Gordan Bobic wrote:
>>>
>>> First-access failing bug still seems to be present.
>>> But other than that, it seems to be distinctly better than rc4. :)
>>> Good work! :)
>>
>> And that massive memory leak is gone, too! The process hasn't grown by a
>> KB after a kernel compile! :D
>>
>> s/Good work/Awesome work/
>>
>> :)
>>
>>
>> Gordan
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@xxxxxxxxxx
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>