Re: Unify lockup with infinite "bailing transport"

NovA <av.nova@xxxxxxxxx> · Fri, 13 Jun 2008 14:26:41 +0400

Avati,
  I've send the log 5 days ago by private mail. Have you recieved it?
Can I provide any additional info?

Meanwhile I've tried some tests in debug mode. And now the ALU client
on the head node (with namespace) doesn't work at all. :)

First, I should describe the glusterFS config on the head node in more
detail. The server and client runs in different processes. The server
config is rather ordinary:
---------- server.vol --------
volume brick-ns
  type storage/posix                  # POSIX FS translator
  option directory /mnt/glusterfs-ns  # Export this directory
end-volume

volume disk
  type storage/posix
  option directory /mnt/hd.sda3/glusterfs-export
end-volume

volume brick
  type features/posix-locks
  option mandatory on
  subvolumes disk
end-volume

volume server
  type protocol/server
  option transport-type tcp/server
# option listen-port 6996
# option client-volume-filename /etc/glusterfs/client.vol
  subvolumes brick brick-ns
  option auth.ip.brick.allow 10.1.0.*
  option auth.ip.brick-ns.allow 10.1.0.*
end-volume
----------------------

But the client config is probably incorrect (as I think now):
--------------- client.vol --------------
# NB: Namespace is storage/posix, not protocol/client
volume c-ns
    type storage/posix
    option directory /mnt/glusterfs-ns
end-volume

volume c31
  type protocol/client
  option transport-type tcp/client
  option remote-host 10.1.0.31
# option remote-port 6996
  option remote-subvolume brick
end-volume

[....skiped....]

volume bricks
  type cluster/unify
  option namespace c-ns
  subvolumes c31 c32 c33 c34 c35 c36 c37 c38 c41 c42 c43 c44 c45 c46
c47 c48 c51 c52 c53 c54 c55 c56 c57 c58

  option scheduler alu
    option alu.limits.min-free-disk  5%
    option alu.limits.max-open-files 10000

    option alu.order
disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage

    option alu.disk-usage.entry-threshold 2GB
    option alu.disk-usage.exit-threshold  128MB

    option alu.open-files-usage.entry-threshold 1024
    option alu.open-files-usage.exit-threshold  32

    option alu.stat-refresh.interval 10sec
end-volume

volume wb
  type performance/write-behind
  option aggregate-size 1MB
  subvolumes bricks
end-volume
---------

Running glusterFS in such a way, I'd disconnected one cluster node
(namely c35) and executed "ls -l /home" on the head node (it's a
glusterFS mount point). At once the ALU unify locked up, and the
client.log got the following lines :
--------------- cut from the log file I've send by private mail ----------
2008-06-05 12:38:19 E [tcp-client.c:190:tcp_connect] c35: non-blocking
connect() returned: 113 (No route to host)
2008-06-05 12:39:48 E [tcp-client.c:190:tcp_connect] c35: non-blocking
connect() returned: 113 (No route to host)
2008-06-05 12:43:41 E [tcp-client.c:190:tcp_connect] c35: non-blocking
connect() returned: 113 (No route to host)
2008-06-05 12:44:11 W [client-protocol.c:332:client_protocol_xfer]
c35: not connected at the moment to submit frame type(1) op(34)
2008-06-05 12:44:11 E [client-protocol.c:4423:client_lookup_cbk] c35:
no proper reply from server, returning ENOTCONN
2008-06-05 12:44:11 E [unify.c:182:unify_lookup_cbk] bricks: c35 returned 107
2008-06-05 12:44:11 E [unify.c:265:unify_lookup_cbk] bricks:
Revalidate failed for /danilov
2008-06-05 12:44:11 E [fuse-bridge.c:459:fuse_entry_cbk]
glusterfs-fuse: 1158227: (34) /danilov => -1 (107)
2008-06-05 12:44:11 W [client-protocol.c:332:client_protocol_xfer]
c35: not connected at the moment to submit frame type(1) op(34)
2008-06-05 12:44:11 E [client-protocol.c:4423:client_lookup_cbk] c35:
no proper reply from server, returning ENOTCONN
2008-06-05 12:44:11 E [client-protocol.c:4572:client_checksum] c35:
/danilov: returning EINVAL
2008-06-05 12:44:11 W [client-protocol.c:332:client_protocol_xfer]
c35: not connected at the moment to submit frame type(1) op(34)
2008-06-05 12:44:11 E [client-protocol.c:4423:client_lookup_cbk] c35:
no proper reply from server, returning ENOTCONN
2008-06-05 12:44:11 E [unify.c:182:unify_lookup_cbk] bricks: c35 returned 107
2008-06-05 12:45:01 W [client-protocol.c:205:call_bail] c58:
activating bail-out. pending frames = 2. last sent = 2008-06-05
12:44:11. last received = 2008-06-05 12:44:11 transport-timeout = 42
2008-06-05 12:45:01 C [client-protocol.c:212:call_bail] c58: bailing transport
2008-06-05 12:45:01 W [client-protocol.c:205:call_bail] c42:
activating bail-out. pending frames = 2. last sent = 2008-06-05
12:44:11. last received = 2008-06-05 12:44:11 transport-timeout = 42
2008-06-05 12:45:01 C [client-protocol.c:212:call_bail] c42: bailing transport
2008-06-05 12:45:01 W [client-protocol.c:205:call_bail] c53:
activating bail-out. pending frames = 2. last sent = 2008-06-05
12:44:11. last received = 2008-06-05 12:44:11 transport-timeout = 42
2008-06-05 12:45:01 C [client-protocol.c:212:call_bail] c53: bailing transport
--------------
I was forced to reset the head node (glusterfs process couldn't be
killed). This helped before, but not that time. The head node
glusterFS ALU now always locks on "ls -l /home".
I'd revised the client.vol config and changed the namespace type to
ordinary "protocol/client". It became possible to kill glusterFS
client while it "bailing transport" and therefore to experiment...
The head-node client running in debug mode logs the following while
executing "ls -l /home":
-------
2008-06-11 14:49:57 D [fuse-bridge.c:375:fuse_entry_cbk]
glusterfs-fuse: 11: (34) /chouvakhov => 426835
2008-06-11 14:49:57 D [inode.c:577:__create_inode] fuse/inode: create
inode(426835)
2008-06-11 14:49:57 D [inode.c:367:__active_inode] fuse/inode:
activating inode(426835), lru=4/1024
2008-06-11 14:49:57 D [fuse-bridge.c:512:fuse_lookup] glusterfs-fuse:
12: LOOKUP /danilov
2008-06-11 14:50:40 W [client-protocol.c:205:call_bail] c-ns:
activating bail-out. pending frames = 20. last sent = 2008-06-11
14:49:57. last received = 2008-06-11 14:49:57 transport-timeout = 42
2008-06-11 14:50:40 C [client-protocol.c:212:call_bail] c-ns: bailing transport
2008-06-11 14:50:40 D [tcp.c:137:cont_hand] tcp: forcing
poll/read/write to break on blocked socket (if any)
2008-06-11 14:50:40 W [client-protocol.c:4777:client_protocol_cleanup]
c-ns: cleaning up state in transport object 0x64a660
2008-06-11 14:50:40 E [client-protocol.c:4827:client_protocol_cleanup]
c-ns: forced unwinding frame type(1) op(36) reply=@0x64d1a0
2008-06-11 14:50:40 E [client-protocol.c:4208:client_setdents_cbk]
c-ns: no proper reply from server, returning ENOTCONN
2008-06-11 14:50:40 E [client-protocol.c:4827:client_protocol_cleanup]
c-ns: forced unwinding frame type(1) op(36) reply=@0x64d1a0
2008-06-11 14:50:40 E [client-protocol.c:4208:client_setdents_cbk]
c-ns: no proper reply from server, returning ENOTCONN
[... skipped 18*2 duplicate lines ...]
2008-06-11 14:50:40 E [client-protocol.c:4827:client_protocol_cleanup]
c-ns: forced unwinding frame type(1) op(23) reply=@0x64d1a0
2008-06-11 14:50:40 E [client-protocol.c:3303:client_getdents_cbk]
c-ns: no proper reply from server, returning ENOTCONN
2008-06-11 14:50:40 W [client-protocol.c:1711:client_closedir] c-ns:
no proper fd found, returning
2008-06-11 14:50:40 D [tcp.c:87:tcp_disconnect] c-ns: connection disconnected
2008-06-11 14:50:40 D
[client-protocol.c:4750:client_protocol_reconnect] c-ns: attempting
reconnect
----------
So, something strange happened with the /home/danilov dir... I've
checked the underlying ext3 FS and found no errors. The contents of
this dir is fully accessible from all other cluster nodes, provided
the head node client is not running.
Maybe the problem is with the namespace, and I should just recreate
it. But it's interesting to investigate what is going on and how to
fix it in a more right way. :)

Looking forward for any answer,
  Andrey

2008/6/8 Anand Avati <avati@xxxxxxxxxxxxx>:
> Andrey,
>  can you send your complete log file, especially the section which logs
> about the frames forcefully unwound after the bail..
>
> avati
>
> On 08/06/2008, NovA <av.nova@xxxxxxxxx> wrote:
>>
>> Hello everybody!
>>
>> I've using GlusterFS for rather long time. It's a great project! Thanks!
>> My old GlusterFS 1.2tla184 is a rock stable, but with new 1.3.x series
>> I still have problems :(. Here there is a bunch of them for the
>> version 1.3.9tla772.
>>
>> I use unify over 24 bricks, each on different cluster node. Each node
>> runs glusterfs server (exporting local HDD) and client (mounting
>> glusterFS unify at /home) as different processes. Server xlators are:
>> storage/posix (and name-space on head node) -> features/posix-locks ->
>> tcp/server. Client consists of: tcp/clients -> cluster/unify NUFA
>> (except the head node with ALU) -> performance/write-behind. Each node
>> runs openSUSE 10.3 with kernel 2.6.22.17-0.1 x86_64, fuse-2.7.2glfs9.
>>
>> 1) The most annoying problem is a complete glusterFS lockup. It became
>> apparent in real-world usage by multiple users. At random moment any
>> attempt to access glusterFS on head node (name-space, ALU unify) fails
>> and the client.log is flooded by messages like
>> --------
>> 2008-06-03 17:11:06 W [client-protocol.c:205:call_bail] c36:
>> activating bail-out. pending frames = 3. last sent = 2008-06-03
>> 17:10:23. last received = 2008-06-03 17:10:23 transport-timeout = 42
>> 2008-06-03 17:11:06 C [client-protocol.c:212:call_bail] c36: bailing
>> transport
>> 2008-06-03 17:11:06 W [client-protocol.c:205:call_bail] c45:
>> activating bail-out. pending frames = 4. last sent = 2008-06-03
>> 17:10:23. last received = 2008-06-03 17:10:23 transport-timeout = 42
>> 2008-06-03 17:11:06 C [client-protocol.c:212:call_bail] c45: bailing
>> transport
>> --------
>> repeated infinitely each minute (with node names changing in loop). A
>> have two log files of 62Mb and 138Mb filled with such errors (they
>> were generated when I left the system unattended for a day). Moreover,
>> when the glusterfs enters in such a regime it can't be killed, even
>> with "killall -9 glusterfs". But on the other cluster nodes (with NUFA
>> unify) logs are free from these messages, and it is possible to access
>> unify FS without lockup.
>>
>> I can't identify the initial cause of the lockup. Once it happened
>> just after I switched off one of the bricks. But most of the times
>> there is no any unusual actions on FS, just file/dir creation and
>> coping/moving. Logs are too huge and full of other errors (see
>> bellow), to find the cause. BTW, what does this message mean? :)
>>
>> 2) The second problem is already mentioned in the mailing list -
>> sometimes files are double created on bricks. And the file became
>> inaccessible until I delete one copy. Can this be done automatically?
>>
>> 3) My logs are also full of the following error:
>> -----
>> 2008-06-02 16:03:33 E [unify.c:325:unify_lookup] bricks: returning
>> ESTALE for / [translator generation (25) inode generation (23)]
>> 2008-06-02 16:03:33 E [fuse-bridge.c:459:fuse_entry_cbk]
>> glusterfs-fuse: 301: (34) / => -1 (116)
>> 2008-06-02 16:03:33 E [unify.c:325:unify_lookup] bricks: returning
>> ESTALE for / [translator generation (25) inode generation (23)]
>> 2008-06-02 16:03:33 E [fuse-bridge.c:459:fuse_entry_cbk]
>> glusterfs-fuse: 302: (34) / => -1 (116)
>> -----
>> This error happens when the glusterFS mount point is touched somehow
>> (for example "ls /home"), but not the subdirs. Despite of the error
>> such an operation succeeds, but with a lag.
>> It seems, that this is somehow connected with the non-simultaneous
>> start of the cluster nodes (namely their glusterfs servers). When all
>> nodes are up, the remount of the glusterfs helps to get rid of the
>> mentioned error.