Re: kslowd issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 23, 2009 at 5:01 AM, Greg M <gregm@xxxxxxxxxxxx> wrote:
> Hi David,
>
> We are now running 2.6.32 - no kslowd issues at all, however during peak
> times  (only ~12Mbps of NFS traffic per box) we get this in dmesg.
>
>
> CacheFiles: I/O Error: Unlink failed
> FS-Cache: Cache cachefiles stopped due to I/O error
>
> Then restart:
>
> CacheFiles: File cache on sdb1 unregistering
> FS-Cache: Withdrawing cache "mycache"
> FS-Cache: Cache "mycache" added (type cachefiles)
> CacheFiles: File cache on sdb1 registered
>
> Peak period again:
>
> CacheFiles: I/O Error: Unlink failed
> FS-Cache: Cache cachefiles stopped due to I/O error
>
> Restart:
>
> CacheFiles: File cache on sdb1 unregistering
> FS-Cache: Withdrawing cache "mycache"
> FS-Cache: Cache "mycache" added (type cachefiles)
> CacheFiles: File cache on sdb1 registered
>
> Peak period again:
>
> CacheFiles: I/O Error: Unlink failed
> FS-Cache: Cache cachefiles stopped due to I/O error
>
>
> And so on.
>
> This is happening on all 10 production VMware guests, running Gentoo on an
> IBM Bladecenter.
>
> Linux dnetwww2 2.6.32-gentoo #1 SMP Sun Dec 20 06:54:41 CST 2009 x86_64
> Intel(R) Xeon(R) CPU X3360 @ 2.83GHz GenuineIntel GNU/Linux
>
> Greg


I've seen the same thing on Debian Etch and Debian Lenny on 2.6.32 and
2.6.32.1, all on pretty heavily utilized servers (all Dell 1950s, not
virtualized) -- all serving web hosting traffic over NFS, i.e.
fscache-heavy stuff with *lots* of individual files. Both are using
cachefilesd-0.9 -- the Etch one I statically compiled; the Lenny one
is from lenny-backports.

On Etch (different server), it died with this error but without any Oops
Dec 14 14:59:16 server kernel: [   52.568006] FS-Cache: Cache
"CacheFiles" added (type cachefiles)
Dec 14 14:59:16 server kernel: [   52.568010] CacheFiles: File cache
on sda4 registered
Dec 14 15:37:30 server kernel: [ 2347.259571] CacheFiles: I/O Error:
Unlink failed
Dec 14 15:37:30 server kernel: [ 2347.259578] FS-Cache: Cache
cachefiles stopped due to I/O error

On Lenny:
Dec 15 17:43:09 server kernel: [ 1589.670513] CacheFiles: I/O Error:
Unlink failed
Dec 15 17:43:09 server kernel: [ 1589.670518] FS-Cache: Cache
cachefiles stopped due to I/O error
Dec 15 17:43:23 server cachefilesd[8944]: Refilling cull table
Dec 15 17:43:23 server cachefilesd[8944]: Failed to check object's
in-use state: errno 5 (Input/output error)
Dec 15 17:43:23 server kernel: [ 1603.311806] CacheFiles: File cache
on sdb3 unregistering
Dec 15 17:43:23 server kernel: [ 1603.311810] FS-Cache: Withdrawing
cache "CacheFiles"


Lenny oopses:

There's lots of the below but caching seems to continue for a while
afterwards, sometimes up to a couple of hours. My most recent attempts
will get the Oops but sometimes it won't be hours till the
"CacheFiles: I/O Error: Unlink failed" knocks out the cache.

Dec 15 17:27:07 server kernel: [  627.073122] ------------[ cut here
]------------
Dec 15 17:27:07 server kernel: [  627.073127] WARNING: at fs/sysfs/dir.c:491 ()
Dec 15 17:27:07 server kernel: [  627.073130] Hardware name: PowerEdge 1950
Dec 15 17:27:07 server kernel: [  627.073132] sysfs: cannot create
duplicate filename '/class/bdi/0:209'
Dec 15 17:27:07 server kernel: [  627.073135] Modules linked in:
dm_snapshot dm_mirror dm_region_hash dm_log dm_mod xfs tg3 libphy
nls_iso8859_1 i2c
_i801 i2c_core evdev i5000_edac i5k_amb hwmon button dcdbas ide_cd_mod
cdrom bnx2 fan [last unloaded: scsi_wait_scan]
Dec 15 17:27:07 server kernel: [  627.073164] Pid: 8335, comm: httpd
Not tainted 2.6.32.1-nx #1
Dec 15 17:27:07 server kernel: [  627.073166] Call Trace:
Dec 15 17:27:07 server kernel: [  627.073171]  [<0003143a>] ?
Dec 15 17:27:07 server kernel: [  627.073174]  [<00031446>] ?
Dec 15 17:27:07 server kernel: [  627.073177]  [<0003148b>] ?
Dec 15 17:27:07 server kernel: [  627.073180]  [<00104bfd>] ?
Dec 15 17:27:07 server kernel: [  627.073183]  [<0010505c>] ?
Dec 15 17:27:07 server kernel: [  627.073185]  [<001050ab>] ?
Dec 15 17:27:07 server kernel: [  627.073188]  [<002052d3>] ?
Dec 15 17:27:07 server kernel: [  627.073191]  [<00205385>] ?
Dec 15 17:27:07 server kernel: [  627.073193]  [<002058ac>] ?
Dec 15 17:27:07 server kernel: [  627.073196]  [<00276764>] ?
Dec 15 17:27:07 server kernel: [  627.073199]  [<00205127>] ?
Dec 15 17:27:07 server kernel: [  627.073201]  [<0027b7c2>] ?
Dec 15 17:27:07 server kernel: [  627.073204]  [<00276c28>] ?
Dec 15 17:27:07 server kernel: [  627.073207]  [<0009f0ad>] ?
Dec 15 17:27:07 server kernel: [  627.073210]  [<0009f197>] ?
Dec 15 17:27:07 server kernel: [  627.073212]  [<0016ec3e>] ?
Dec 15 17:27:07 server kernel: [  627.073215]  [<000c2a97>] ?
Dec 15 17:27:07 server kernel: [  627.073218]  [<00174e66>] ?
Dec 15 17:27:07 server kernel: [  627.073221]  [<00007e7f>] ?
Dec 15 17:27:07 server kernel: [  627.073223]  [<001d41ed>] ?
Dec 15 17:27:07 server kernel: [  627.073227]  [<00007fc2>] ?
Dec 15 17:27:07 server kernel: [  627.073229]  [<0001651a>] ?
Dec 15 17:27:07 server kernel: [  627.073232]  [<002c0000>] ?
Dec 15 17:27:07 server kernel: [  627.073236]  [<000b8e5b>] ?
Dec 15 17:27:07 server kernel: [  627.073238]  [<0007f6af>] ?
Dec 15 17:27:07 server kernel: [  627.073241]  [<000cabf7>] ?
Dec 15 17:27:07 server kernel: [  627.073244]  [<000cafd7>] ?
Dec 15 17:27:07 server kernel: [  627.073247]  [<000cac99>] ?
Dec 15 17:27:07 server kernel: [  627.073249]  [<000cafd7>] ?
Dec 15 17:27:07 server kernel: [  627.073252]  [<000cb2bd>] ?
Dec 15 17:27:07 server kernel: [  627.073255]  [<000cb396>] ?
Dec 15 17:27:07 server kernel: [  627.073257]  [<000cbd49>] ?
Dec 15 17:27:07 server kernel: [  627.073260]  [<0005124e>] ?
Dec 15 17:27:07 server kernel: [  627.073263]  [<00050f04>] ?
Dec 15 17:27:07 server kernel: [  627.073265]  [<0005124e>] ?
Dec 15 17:27:07 server kernel: [  627.073268]  [<000c3f8b>] ?
Dec 15 17:27:07 server kernel: [  627.073270]  [<0007f698>] ?
Dec 15 17:27:07 server kernel: [  627.073273]  [<000c3ff8>] ?
Dec 15 17:27:07 server kernel: [  627.073275]  [<0000466f>] ?
Dec 15 17:27:07 server kernel: [  627.073280] ---[ end trace
927e9ac79397ac32 ]---
Dec 15 17:27:07 server kernel: [  627.073284] kobject_add_internal
failed for 0:209 with -EEXIST, don't try to register things with the
same name in the same directory.

all with variations on "sysfs: cannot create duplicate filename
'/class/bdi/0:209'",e.g.:

sysfs: cannot create duplicate filename '/class/bdi/0:209'
sysfs: cannot create duplicate filename '/class/bdi/0:209'
sysfs: cannot create duplicate filename '/class/bdi/0:198'
sysfs: cannot create duplicate filename '/class/bdi/0:198'
sysfs: cannot create duplicate filename '/class/bdi/0:196'
sysfs: cannot create duplicate filename '/class/bdi/0:196'

I've also seen this in the logs, but caching seems to continue: NFS:
Cache request denied due to non-unique superblock keys

What sort of debugging info would it be helpful for us to gather? NFS
caching in the kernel is like a dream come true for me, so I'm happy
to help in info gathering and trying out various settings, etc.

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs


[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]
  Powered by Linux