Re: Self-heal Problems with gluster and nfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 07/08/2014 04:23 PM, Norman Mähler wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Of course:

The configuration is:

Volume Name: gluster_dateisystem
Type: Replicate
Volume ID: 2766695c-b8aa-46fd-b84d-4793b7ce847a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: filecluster1:/mnt/raid
Brick2: filecluster2:/mnt/raid
Options Reconfigured:
nfs.enable-ino32: on
performance.cache-size: 512MB
diagnostics.brick-log-level: WARNING
diagnostics.client-log-level: WARNING
nfs.addr-namelookup: off
performance.cache-refresh-timeout: 60
performance.cache-max-file-size: 100MB
performance.write-behind-window-size: 10MB
performance.io-thread-count: 18
performance.stat-prefetch: off


The file count in xattrop is
Do "gluster volume set gluster_dateisystem cluster.self-heal-daemon off"
This should stop all the entry self-heals and should also get the CPU usage low. When you don't have a lot of activity you can enable it again using "gluster volume set gluster_dateisystem cluster.self-heal-daemon on" If it doesn't get the CPU down execute "gluster volume set gluster_dateisystem cluster.entry-self-heal off". Let me know how it goes.

Pranith

Brick 1: 2706
Brick 2: 2687

Norman

Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
It seems like entry self-heal is happening. What is the volume
configuration? Could you give ls
<brick-path>/.glusterfs/indices/xattrop | wc -l Count for all the
bricks

Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
Hello Pranith,

here are the logs. I only giv you the last 3000 lines, because
the nfs.log from today is already 550 MB.

There are the standard files from a user home on the gluster
system. All you normally find in a user home. Config files,
firefox and thunderbird files etc.

Thanks in advance Norman

Am 08.07.2014 11:46, schrieb Pranith Kumar Karampuri:
On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello again,

i could resolve the self heal problems with the missing gfid
files on one of the servers by deleting the gfid files on the
other server.

They had a link count of 1 which means that the file on that
the gfid pointed was already deleted.


We have still these errors

[2014-07-08 09:09:43.564488] W
[client-rpc-fops.c:2469:client3_3_link_cbk]
0-gluster_dateisystem-client-0: remote operation failed: File
exists (00000000-0000-0000-0000-000000000000 ->
<gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)

which appear in the glusterfshd.log and these

[2014-07-08 09:13:31.198462] E
[client-rpc-fops.c:5179:client3_3_inodelk]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)




[0x7f5d29d4e6b8]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)




[0x7f5d29d4e2e4]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)




[0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
from the nfs.log.
Could you attach mount (nfs.log) and brick logs please. Do
you have files with lots of hard-links? Pranith
I think the error messages belong together but I don't have any
idea how to solve them.

Still we have got a very bad performance issue. The system load
on the servers is above 20 and nearly no one is able to work in
here on a client...

Hope for help Norman


Am 07.07.2014 15:39, schrieb Pranith Kumar Karampuri:
On 07/07/2014 06:58 PM, Norman Mähler wrote: Dear
community,

we have got some serious problems with our Gluster
installation.

Here is the setting:

We have got 2 bricks (version 3.4.4) on a debian 7.5, one
of them with an nfs export. There are about 120 clients
connecting to the exported nfs. These clients are thin
clients reading and writing their Linux home directories
from the exported nfs.

We want to change the access of these clients one by one
to access via gluster client.
I did not understand what you meant by this. Are you
moving to glusterfs-fuse based mounts?
Here are our problems:

In the moment we have got two types of error messages
which come in burts to our glusterfshd.log

[2014-07-07 13:10:21.572487] W
[client-rpc-fops.c:1538:client3_3_inodelk_cbk]
0-gluster_dateisystem-client-1: remote operation failed:
No such file or directory [2014-07-07 13:10:21.573448] W
[client-rpc-fops.c:471:client3_3_open_cbk]
0-gluster_dateisystem-client-1: remote operation failed:
No such file or directory. Path:
<gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
(00000000-0000-0000-0000-000000000000) [2014-07-07
13:10:21.573468] E
[afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
0-gluster_dateisystem-replicate-0: open of
<gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> failed on
child gluster_dateisystem-client-1 (No such file or
directory)


This looks like a missing gfid file on one of the bricks.
I looked it up and yes the file is missing on the second
brick.

We got these messages the other way round, too (missing
on client-0 and the first brick).

Is it possible to repair this one by copying the gfid
file to the brick where it was missing? Or ist there
another way to repair it?


The second message is

[2014-07-07 13:06:35.948738] W
[client-rpc-fops.c:2469:client3_3_link_cbk]
0-gluster_dateisystem-client-1: remote operation failed:
File exists (00000000-0000-0000-0000-000000000000 ->
<gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)

and I really do not know what to do with this one...
Did any of the bricks went offline and came back
online? Pranith
I am really looking forward to your help because this is
an active system and the system load on the nfs brick is
about 25 (!!)

Thanks in advance! Norman Maehler


_______________________________________________
Gluster-users mailing list Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
- -- Mit freundlichen Grüßen,

Norman Mähler

Bereichsleiter IT-Hochschulservice
uni-assist e. V.
Geneststr. 5
Aufgang H, 3. Etage
10829 Berlin

Tel.: 030-66644382
n.maehler@xxxxxxxxxxxxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTu82TAAoJEB810LSP8y+R3jMH/0K9U2jBrukBDrdDvMf542Cz
Qoi4Lq2KU+SwUL6tcR4kymC0iGe5ZDk0baEOBwzdBmW1Nu19saGKjhXxYskmjSu4
lKJP216229eSOHD6mTlwamgj6DCgxlFZwMzLJMbiEaRhZzFTK5PMbkhslV3IP8IK
jmKlNwdhGVJ7nUCjt+Mu203kCdQUv8X/a3UKO341LkdqlOSSsmhOEL34Mop51vmL
mZZdw5fCZisK29vKeZr1vBvIbRYvx3kBSRjYWPtBq1pRx4DbhTdoYnSfLULt+MJJ
fgYIDS3ykYx/U10wmtHs75+rFxvtXOLe3QiwuakE8nj/quIvKRZorGJ9BSvqYoQ=
=vWIo
-----END PGP SIGNATURE-----

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux