Re: Self-heal Problems with gluster and nfs

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Tue, 08 Jul 2014 16:54:32 +0530

On 07/08/2014 04:49 PM, Norman Mähler wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri:
On 07/08/2014 04:23 PM, Norman Mähler wrote: Of course:

The configuration is:

Volume Name: gluster_dateisystem Type: Replicate Volume ID:
2766695c-b8aa-46fd-b84d-4793b7ce847a Status: Started Number of
Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1:
filecluster1:/mnt/raid Brick2: filecluster2:/mnt/raid Options
Reconfigured: nfs.enable-ino32: on performance.cache-size: 512MB
diagnostics.brick-log-level: WARNING diagnostics.client-log-level:
WARNING nfs.addr-namelookup: off performance.cache-refresh-timeout:
60 performance.cache-max-file-size: 100MB
performance.write-behind-window-size: 10MB
performance.io-thread-count: 18 performance.stat-prefetch: off

The file count in xattrop is
Do "gluster volume set gluster_dateisystem
cluster.self-heal-daemon off" This should stop all the entry
self-heals and should also get the CPU usage low. When you don't
have a lot of activity you can enable it again using "gluster
volume set gluster_dateisystem cluster.self-heal-daemon on" If it
doesn't get the CPU down execute "gluster volume set
gluster_dateisystem cluster.entry-self-heal off". Let me know how
it goes.
Pranith
Thanks for your help so far but stopping the self heal deamon and the
self heal machanism itself did not improve the situation.

Do you have further suggestions?
Is it simply the load on the system? NFS could handle it easily before...
Is it at least a little better or no improvement at all?

Pranith

Norman

Brick 1: 2706 Brick 2: 2687

Norman

Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
It seems like entry self-heal is happening. What is the
volume configuration? Could you give ls
<brick-path>/.glusterfs/indices/xattrop | wc -l Count for all
the bricks

Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
Hello Pranith,

here are the logs. I only giv you the last 3000 lines,
because the nfs.log from today is already 550 MB.

There are the standard files from a user home on the
gluster system. All you normally find in a user home.
Config files, firefox and thunderbird files etc.

Thanks in advance Norman

Am 08.07.2014 11:46, schrieb Pranith Kumar Karampuri:
On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello
again,

i could resolve the self heal problems with the missing
gfid files on one of the servers by deleting the gfid
files on the other server.

They had a link count of 1 which means that the file on
that the gfid pointed was already deleted.

We have still these errors

[2014-07-08 09:09:43.564488] W
[client-rpc-fops.c:2469:client3_3_link_cbk]
0-gluster_dateisystem-client-0: remote operation failed:
File exists (00000000-0000-0000-0000-000000000000 ->
<gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)

which appear in the glusterfshd.log and these

[2014-07-08 09:13:31.198462] E
[client-rpc-fops.c:5179:client3_3_inodelk]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)

[0x7f5d29d4e6b8]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)

[0x7f5d29d4e2e4]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)

[0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
from the nfs.log.
Could you attach mount (nfs.log) and brick logs please.
Do you have files with lots of hard-links? Pranith
I think the error messages belong together but I don't
have any idea how to solve them.

Still we have got a very bad performance issue. The
system load on the servers is above 20 and nearly no one
is able to work in here on a client...

Hope for help Norman

Am 07.07.2014 15:39, schrieb Pranith Kumar Karampuri:
On 07/07/2014 06:58 PM, Norman Mähler wrote: Dear
community,

we have got some serious problems with our Gluster
installation.

Here is the setting:

We have got 2 bricks (version 3.4.4) on a debian
7.5, one of them with an nfs export. There are
about 120 clients connecting to the exported nfs.
These clients are thin clients reading and writing
their Linux home directories from the exported
nfs.

We want to change the access of these clients one
by one to access via gluster client.
I did not understand what you meant by this. Are
you moving to glusterfs-fuse based mounts?
Here are our problems:

In the moment we have got two types of error
messages which come in burts to our
glusterfshd.log

[2014-07-07 13:10:21.572487] W
[client-rpc-fops.c:1538:client3_3_inodelk_cbk]
0-gluster_dateisystem-client-1: remote operation
failed: No such file or directory [2014-07-07
13:10:21.573448] W
[client-rpc-fops.c:471:client3_3_open_cbk]
0-gluster_dateisystem-client-1: remote operation
failed: No such file or directory. Path:
<gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
(00000000-0000-0000-0000-000000000000) [2014-07-07
13:10:21.573468] E
[afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
0-gluster_dateisystem-replicate-0: open of
<gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc> failed
on child gluster_dateisystem-client-1 (No such file
or directory)

This looks like a missing gfid file on one of the
bricks. I looked it up and yes the file is missing
on the second brick.

We got these messages the other way round, too
(missing on client-0 and the first brick).

Is it possible to repair this one by copying the
gfid file to the brick where it was missing? Or ist
there another way to repair it?

The second message is

[2014-07-07 13:06:35.948738] W
[client-rpc-fops.c:2469:client3_3_link_cbk]
0-gluster_dateisystem-client-1: remote operation
failed: File exists
(00000000-0000-0000-0000-000000000000 ->
<gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)

and I really do not know what to do with this
one...
Did any of the bricks went offline and came back
online? Pranith
I am really looking forward to your help because
this is an active system and the system load on the
nfs brick is about 25 (!!)

Thanks in advance! Norman Maehler

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
- -- Mit freundlichen Grüßen,
Norman Mähler

Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5
Aufgang H, 3. Etage 10829 Berlin

Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx

- -- 
Mit freundlichen Grüßen,

Norman Mähler

Bereichsleiter IT-Hochschulservice
uni-assist e. V.
Geneststr. 5
Aufgang H, 3. Etage
10829 Berlin

Tel.: 030-66644382
n.maehler@xxxxxxxxxxxxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTu9PdAAoJEB810LSP8y+R8Y0IAIgMbGUqcvXrRvDr2hOuZ4+f
AyQ6srf3LvWtxfBG7pOfjfQMrdTFlsCzDLeYUMTpt30Yn6ZfUjStth3dp3K5ZUAT
iX9zOZC1xaKF1NPwGAzyKMNr83I/54tv/au4VGrJwAV2WAPvNfsEjbY5x+i4YSfH
9Tc2IA4G51Ecd0Lr06LPThyj8Sa++635Bms+Q7swL+mkjItE+Quu+xOXUiND7u31
l6b/bnZZc77OFWkcZRG97vWkGkb+xQkupLH18VIl5l0nDBJVHN4wN+Xlym183NvO
ygH6hd1dpvLAZnzWS2jKhEp48jwPeDc6/kJt0PcIZW7+2cDwKBTbXx/b+u1PEuE=
=c8Px
-----END PGP SIGNATURE-----

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users