FUSE I/O Lockup while accessing glusterfs shares

gluster at tauceti.net (RW) · Tue, 22 Jun 2010 21:00:58 +0200

What distribution, kernel and GlusterFS versions are you using?
I've had problems with some kernels between 2.6.31 and 2.6.32-r10
(-r10 is a Gentoo release number) with GlusterFS 3.0.4 inside a KVM
and KVM versions < 0.12.4. Kernel 2.6.30 or latest 2.6.32-r11
and KVM 0.12.4 seem to work fine now. We also have had some
NFSv4 problems with the kernels mentioned above. 2.6.30 and
2.6.32-r11 doesn't have this problem.

The problem was a race condition in KVM's virtio-net driver.
Maybe you have some problems with your network(driver) too.

Some weeks ago we have moved a directory which was used by a
Drupal cache module from GlusterFS to NFSv4 since we've had a
lot of problems with it. There was also heavy writing and reading
into this directory. Currently I'm not sure anymore if this was
a kernel or GlusterFS problem. I've never tested it again with
GlusterFS. Some other directories with mainly reading are working
fine.

- Robert

On 06/22/10 17:40, Hugo Cisneiros (Eitch) wrote:
> Hi,
> 
> We have a web server installation that is using glusterFS. There are 3
> GlusterFS servers and 4 GlusterFS clients acessing one share. We're
> having a problem that randomly, the httpd processes begin to hung
> while waiting for I/O. This begins to happen with the following kernel
> call trace:
> 
> kernel: INFO: task httpd.worker:18974 blocked for more than 120 seconds.
> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> kernel: httpd.worker  D ffffffff80150462     0 18974   4818
> 18975 18972 (NOTLB)
> kernel: ffff8108c0ee7c78 0000000000000086 ffff8101f3b2b000 ffff810116d0f200
> kernel: ffff81037392c6d8 000000000000000a ffff8106b73d57a0 ffff810116dd4100
> kernel: 00004f0baea8a21e 00000000000026f5 ffff8106b73d5988 0000000300000000
> kernel: Call Trace:
> kernel: [<ffffffff88443cda>] :fuse:fuse_dentry_revalidate+0x13e/0x1c1
> kernel: [<ffffffff80063c6f>] __mutex_lock_slowpath+0x60/0x9b
> kernel: [<ffffffff80063cb9>] .text.lock.mutex+0xf/0x14
> kernel: [<ffffffff8000cec2>] do_lookup+0x90/0x1e6
> kernel: [<ffffffff80009bb2>] __link_path_walk+0x3a6/0xf42
> kernel: [<ffffffff8000e9e2>] link_path_walk+0x42/0xb2
> kernel: [<ffffffff8000ccb2>] do_path_lookup+0x275/0x2f1
> kernel: [<ffffffff8001280e>] getname+0x15b/0x1c2
> kernel: [<ffffffff80023876>] __user_walk_fd+0x37/0x4c
> kernel: [<ffffffff80028846>] vfs_stat_fd+0x1b/0x4a
> kernel: [<ffffffff8000c5fe>] _atomic_dec_and_lock+0x39/0x57
> kernel: [<ffffffff800235a8>] sys_newstat+0x19/0x31
> kernel: [<ffffffff80012b54>] __fput+0x191/0x1bd
> kernel: [<ffffffff8002c9e4>] mntput_no_expire+0x19/0x89
> kernel: [<ffffffff80023b84>] filp_close+0x5c/0x64
> kernel: [<ffffffff8001dfa6>] sys_close+0x88/0xbd
> kernel: [<ffffffff8005d116>] system_call+0x7e/0x83
> 
> Looking at the glusterfs client logs, the following entries begin to
> show up before this call trace begins:
> 
> [fuse-bridge.c:585:fuse_lookup] glusterfs-fuse: 89023783: LOOKUP
> 46912563008752/2010 (fuse_loc_fill() failed)
> [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024196: GETATTR
> 46912563008752 (fuse_loc_fill() failed)
> [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024203: GETATTR
> 46912563008752 (fuse_loc_fill() failed)
> [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024207: GETATTR
> 46912563008752 (fuse_loc_fill() failed)
> [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024210: GETATTR
> 46912563008752 (fuse_loc_fill() failed)
> [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024213: GETATTR
> 46912563008752 (fuse_loc_fill() failed)
> [fuse-bridge.c:793:fuse_getattr] glusterfs-fuse: 89024218: GETATTR
> 46912563008752 (fuse_loc_fill() failed)
> 
> The same message repeats 1116 times. Talking with a friend, we noticed
> that the 4 web servers are reading *and* writing very concurrently and
> because of this, there are many lookups, reads and unlinks returning
> "no such file or directory" on the log, like these:
> 
> [fuse-bridge.c:859:fuse_fd_cbk] glusterfs-fuse: 88969079: OPEN()
> /cache/70ea5f0d4df01648be4f70d9813f24b7.meta => -1 (No such file or
> directory)
> [fuse-bridge.c:1215:fuse_unlink_cbk] glusterfs-fuse: 88969080:
> UNLINK() /cache/70ea5f0d4df01648be4f70d9813f24b7.meta => -1 (No such
> file or directory)
> [fuse-bridge.c:491:fuse_entry_cbk] glusterfs-fuse:
> LOOKUP(/cache/index.html) inode (ptr=0x2aaab313f9f0, ino=1528908,
> gen=5480759143103581058) found conflict (ptr=0x2aaadebc8690,
> ino=1528908, gen=5480759143103581058)
> 
> After the kernel call trace appears, the fuse filesystem becomes very
> unresponsive and unstable. Who tries to read the mounted directory,
> randomly gets its process locked (with status waiting for I/O) and
> hangs forever. We know that maybe glusterfs is not made for high
> concurrency for both writing and reading, so we're working on getting
> an alternate solution for this behavior (it's a disk cache, we're
> transfering for something like memcached). But maybe this error helps
> to find something unexpected going on... :)
> 
> Here's the client configuration:
> 
> volume vol01
>   type protocol/client
>   option transport-type tcp
>   option remote-host server01.domain
>   option remote-subvolume vol
> end-volume
> 
> volume vol02
>   type protocol/client
>   option transport-type tcp
>   option remote-host server02.domain
>   option remote-subvolume vol
> end-volume
> 
> volume vol03
>   type protocol/client
>   option transport-type tcp
>   option remote-host server03.domain
>   option remote-subvolume vol
> end-volume
> 
> volume vol-mirror
>     type cluster/replicate
>     subvolumes vol01 vol02 vol03
> end-volume
> 
> volume vol-iocache
>   type performance/io-cache
>   option cache-size 128MB
>   option cache-timeout 2
>   subvolumes vol-mirror
> end-volume
> 
> volume vol-quickread
>     type performance/quick-read
>     option cache-timeout 1
>     option max-file-size 64kB
>     subvolumes vol-iocache
> end-volume
> 
> volume vol-writebehind
>     type performance/write-behind
>     option cache-size 4MB
>     subvolumes vol-quickread
> end-volume
> 
> volume vol-statprefetch
>     type performance/stat-prefetch
>     subvolumes vol-writebehind
> end-volume
> 
> Thanks,
>