Re: Self-heal Problems with gluster and nfs

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Tue, 08 Jul 2014 18:04:59 +0530

On 07/08/2014 05:23 PM, Norman Mähler wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 08.07.2014 13:24, schrieb Pranith Kumar Karampuri:
On 07/08/2014 04:49 PM, Norman Mähler wrote:

Am 08.07.2014 13:02, schrieb Pranith Kumar Karampuri:
On 07/08/2014 04:23 PM, Norman Mähler wrote: Of course:

The configuration is:

Volume Name: gluster_dateisystem Type: Replicate Volume ID:
2766695c-b8aa-46fd-b84d-4793b7ce847a Status: Started Number
of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1:
filecluster1:/mnt/raid Brick2: filecluster2:/mnt/raid
Options Reconfigured: nfs.enable-ino32: on
performance.cache-size: 512MB diagnostics.brick-log-level:
WARNING diagnostics.client-log-level: WARNING
nfs.addr-namelookup: off performance.cache-refresh-timeout:
60 performance.cache-max-file-size: 100MB
performance.write-behind-window-size: 10MB
performance.io-thread-count: 18 performance.stat-prefetch:
off

The file count in xattrop is
Do "gluster volume set gluster_dateisystem
cluster.self-heal-daemon off" This should stop all the
entry self-heals and should also get the CPU usage low.
When you don't have a lot of activity you can enable it
again using "gluster volume set gluster_dateisystem
cluster.self-heal-daemon on" If it doesn't get the CPU down
execute "gluster volume set gluster_dateisystem
cluster.entry-self-heal off". Let me know how it goes.
Pranith
Thanks for your help so far but stopping the self heal deamon and
the self heal machanism itself did not improve the situation.

Do you have further suggestions? Is it simply the load on the
system? NFS could handle it easily before...
Is it at least a little better or no improvement at all?
After waiting half an hour more the system load is falling steadily.
At the moment it is around 10 which is not good but a lot better than
before.
There are no messages in the nfs.log and the glusterfshd.log anymore.
In the brick log there are still "inode not found - anonymous fd
creation failed" messages.
They should go away once the heal is complete and the system is back to 
normal. I believe you have directories with lots of files?
When can you start the healing process again (i.e. window where there 
won't be a lot of activity and you can afford the high CPU usage) so 
that things will be back to normal?

Pranith

Norman

Pranith
Norman

Brick 1: 2706 Brick 2: 2687

Norman

Am 08.07.2014 12:28, schrieb Pranith Kumar Karampuri:
It seems like entry self-heal is happening. What is
the volume configuration? Could you give ls
<brick-path>/.glusterfs/indices/xattrop | wc -l Count
for all the bricks

Pranith On 07/08/2014 03:36 PM, Norman Mähler wrote:
Hello Pranith,

here are the logs. I only giv you the last 3000
lines, because the nfs.log from today is already 550
MB.

There are the standard files from a user home on the
gluster system. All you normally find in a user
home. Config files, firefox and thunderbird files
etc.

Thanks in advance Norman

Am 08.07.2014 11:46, schrieb Pranith Kumar
Karampuri:
On 07/08/2014 02:46 PM, Norman Mähler wrote: Hello
again,

i could resolve the self heal problems with the
missing gfid files on one of the servers by
deleting the gfid files on the other server.

They had a link count of 1 which means that the
file on that the gfid pointed was already deleted.

We have still these errors

[2014-07-08 09:09:43.564488] W
[client-rpc-fops.c:2469:client3_3_link_cbk]
0-gluster_dateisystem-client-0: remote operation
failed: File exists
(00000000-0000-0000-0000-000000000000 ->
<gfid:b338b09e-2577-45b3-82bd-032f954dd083>/lock)

which appear in the glusterfshd.log and these

[2014-07-08 09:13:31.198462] E
[client-rpc-fops.c:5179:client3_3_inodelk]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(+0x466b8)

[0x7f5d29d4e6b8]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/replicate.so(afr_lock_blocking+0x844)

[0x7f5d29d4e2e4]
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/protocol/client.so(client_inodelk+0x99)

[0x7f5d29f8b3c9]))) 0-: Assertion failed: 0
from the nfs.log.
Could you attach mount (nfs.log) and brick logs
please. Do you have files with lots of
hard-links? Pranith
I think the error messages belong together but I
don't have any idea how to solve them.

Still we have got a very bad performance issue.
The system load on the servers is above 20 and
nearly no one is able to work in here on a
client...

Hope for help Norman

Am 07.07.2014 15:39, schrieb Pranith Kumar
Karampuri:
On 07/07/2014 06:58 PM, Norman Mähler wrote:
Dear community,

we have got some serious problems with our
Gluster installation.

Here is the setting:

We have got 2 bricks (version 3.4.4) on a
debian 7.5, one of them with an nfs export.
There are about 120 clients connecting to the
exported nfs. These clients are thin clients
reading and writing their Linux home
directories from the exported nfs.

We want to change the access of these clients
one by one to access via gluster client.
I did not understand what you meant by
this. Are you moving to glusterfs-fuse
based mounts?
Here are our problems:

In the moment we have got two types of error
messages which come in burts to our
glusterfshd.log

[2014-07-07 13:10:21.572487] W
[client-rpc-fops.c:1538:client3_3_inodelk_cbk]

0-gluster_dateisystem-client-1: remote operation
failed: No such file or directory
[2014-07-07 13:10:21.573448] W
[client-rpc-fops.c:471:client3_3_open_cbk]
0-gluster_dateisystem-client-1: remote
operation failed: No such file or directory.
Path:
<gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
(00000000-0000-0000-0000-000000000000)
[2014-07-07 13:10:21.573468] E
[afr-self-heal-data.c:1270:afr_sh_data_open_cbk]

0-gluster_dateisystem-replicate-0: open of
<gfid:b0c4f78a-249f-4db7-9d5b-0902c7d8f6cc>
failed on child gluster_dateisystem-client-1
(No such file or directory)

This looks like a missing gfid file on one of
the bricks. I looked it up and yes the file
is missing on the second brick.

We got these messages the other way round,
too (missing on client-0 and the first
brick).

Is it possible to repair this one by copying
the gfid file to the brick where it was
missing? Or ist there another way to repair
it?

The second message is

[2014-07-07 13:06:35.948738] W
[client-rpc-fops.c:2469:client3_3_link_cbk]
0-gluster_dateisystem-client-1: remote
operation failed: File exists
(00000000-0000-0000-0000-000000000000 ->
<gfid:aae47250-8f69-480c-ac75-2da2f4d21d7a>/lock)

and I really do not know what to do with this
one...
Did any of the bricks went offline and came
back online? Pranith
I am really looking forward to your help
because this is an active system and the
system load on the nfs brick is about 25
(!!)

Thanks in advance! Norman Maehler

_______________________________________________

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
- -- Mit freundlichen Grüßen,
Norman Mähler

Bereichsleiter IT-Hochschulservice uni-assist e. V.
Geneststr. 5 Aufgang H, 3. Etage 10829 Berlin

Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx

-- Mit freundlichen Grüßen,

Norman Mähler

Bereichsleiter IT-Hochschulservice uni-assist e. V. Geneststr. 5
Aufgang H, 3. Etage 10829 Berlin

Tel.: 030-66644382 n.maehler@xxxxxxxxxxxxx

- -- 
Mit freundlichen Grüßen,

Norman Mähler

Bereichsleiter IT-Hochschulservice
uni-assist e. V.
Geneststr. 5
Aufgang H, 3. Etage
10829 Berlin

Tel.: 030-66644382
n.maehler@xxxxxxxxxxxxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTu9u6AAoJEB810LSP8y+Rf2UIAIwlwr6fX87MpvXAgSkN8jsW
zKAbiMQzGEJmYnKGTKHghUbtlAj1yJmhmdNOXOm5Z6mUBuwWC5U+saww9zzGqwh7
XM650Oqv//PkTcudSgBCf0SX/CwcKjw2/U+apSSvAx2xeMwbVx9gpoXJWG3koCGl
Cimq6QnMDohaMLFbV8ENodf/q6Oa72NpyheX1wY+xHtNOCNan1ioIpqQUxKKZkbd
lztfccmBvXwAVsPsKgQFw8k1ecnR1AaCDrGcjhHTpIcunu18UyPiiYw7M2yMi9UG
qxWUCtYutG4Qx4htbLPv/wl5i4q5tqRKuCQjeS85TXHZ9ORt5bQhgZxJ9oFCNqE=
=IMVQ
-----END PGP SIGNATURE-----

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users