On Wed, Dec 23, 2009 at 1:41 PM, Mark Moseley <moseleymark@xxxxxxxxx> wrote: > On Wed, Dec 23, 2009 at 5:01 AM, Greg M <gregm@xxxxxxxxxxxx> wrote: >> Hi David, >> >> We are now running 2.6.32 - no kslowd issues at all, however during peak >> times (only ~12Mbps of NFS traffic per box) we get this in dmesg. >> >> >> CacheFiles: I/O Error: Unlink failed >> FS-Cache: Cache cachefiles stopped due to I/O error >> >> Then restart: >> >> CacheFiles: File cache on sdb1 unregistering >> FS-Cache: Withdrawing cache "mycache" >> FS-Cache: Cache "mycache" added (type cachefiles) >> CacheFiles: File cache on sdb1 registered >> >> Peak period again: >> >> CacheFiles: I/O Error: Unlink failed >> FS-Cache: Cache cachefiles stopped due to I/O error >> >> Restart: >> >> CacheFiles: File cache on sdb1 unregistering >> FS-Cache: Withdrawing cache "mycache" >> FS-Cache: Cache "mycache" added (type cachefiles) >> CacheFiles: File cache on sdb1 registered >> >> Peak period again: >> >> CacheFiles: I/O Error: Unlink failed >> FS-Cache: Cache cachefiles stopped due to I/O error >> >> >> And so on. >> >> This is happening on all 10 production VMware guests, running Gentoo on an >> IBM Bladecenter. >> >> Linux dnetwww2 2.6.32-gentoo #1 SMP Sun Dec 20 06:54:41 CST 2009 x86_64 >> Intel(R) Xeon(R) CPU X3360 @ 2.83GHz GenuineIntel GNU/Linux >> >> Greg > > > I've seen the same thing on Debian Etch and Debian Lenny on 2.6.32 and > 2.6.32.1, all on pretty heavily utilized servers (all Dell 1950s, not > virtualized) -- all serving web hosting traffic over NFS, i.e. > fscache-heavy stuff with *lots* of individual files. Both are using > cachefilesd-0.9 -- the Etch one I statically compiled; the Lenny one > is from lenny-backports. > > On Etch (different server), it died with this error but without any Oops > Dec 14 14:59:16 server kernel: [ 52.568006] FS-Cache: Cache > "CacheFiles" added (type cachefiles) > Dec 14 14:59:16 server kernel: [ 52.568010] CacheFiles: File cache > on sda4 registered > Dec 14 15:37:30 server kernel: [ 2347.259571] CacheFiles: I/O Error: > Unlink failed > Dec 14 15:37:30 server kernel: [ 2347.259578] FS-Cache: Cache > cachefiles stopped due to I/O error > > On Lenny: > Dec 15 17:43:09 server kernel: [ 1589.670513] CacheFiles: I/O Error: > Unlink failed > Dec 15 17:43:09 server kernel: [ 1589.670518] FS-Cache: Cache > cachefiles stopped due to I/O error > Dec 15 17:43:23 server cachefilesd[8944]: Refilling cull table > Dec 15 17:43:23 server cachefilesd[8944]: Failed to check object's > in-use state: errno 5 (Input/output error) > Dec 15 17:43:23 server kernel: [ 1603.311806] CacheFiles: File cache > on sdb3 unregistering > Dec 15 17:43:23 server kernel: [ 1603.311810] FS-Cache: Withdrawing > cache "CacheFiles" > > > Lenny oopses: > > There's lots of the below but caching seems to continue for a while > afterwards, sometimes up to a couple of hours. My most recent attempts > will get the Oops but sometimes it won't be hours till the > "CacheFiles: I/O Error: Unlink failed" knocks out the cache. > > Dec 15 17:27:07 server kernel: [ 627.073122] ------------[ cut here > ]------------ > Dec 15 17:27:07 server kernel: [ 627.073127] WARNING: at fs/sysfs/dir.c:491 () > Dec 15 17:27:07 server kernel: [ 627.073130] Hardware name: PowerEdge 1950 > Dec 15 17:27:07 server kernel: [ 627.073132] sysfs: cannot create > duplicate filename '/class/bdi/0:209' > Dec 15 17:27:07 server kernel: [ 627.073135] Modules linked in: > dm_snapshot dm_mirror dm_region_hash dm_log dm_mod xfs tg3 libphy > nls_iso8859_1 i2c > _i801 i2c_core evdev i5000_edac i5k_amb hwmon button dcdbas ide_cd_mod > cdrom bnx2 fan [last unloaded: scsi_wait_scan] > Dec 15 17:27:07 server kernel: [ 627.073164] Pid: 8335, comm: httpd > Not tainted 2.6.32.1-nx #1 > Dec 15 17:27:07 server kernel: [ 627.073166] Call Trace: > Dec 15 17:27:07 server kernel: [ 627.073171] [<0003143a>] ? > Dec 15 17:27:07 server kernel: [ 627.073174] [<00031446>] ? > Dec 15 17:27:07 server kernel: [ 627.073177] [<0003148b>] ? > Dec 15 17:27:07 server kernel: [ 627.073180] [<00104bfd>] ? > Dec 15 17:27:07 server kernel: [ 627.073183] [<0010505c>] ? > Dec 15 17:27:07 server kernel: [ 627.073185] [<001050ab>] ? > Dec 15 17:27:07 server kernel: [ 627.073188] [<002052d3>] ? > Dec 15 17:27:07 server kernel: [ 627.073191] [<00205385>] ? > Dec 15 17:27:07 server kernel: [ 627.073193] [<002058ac>] ? > Dec 15 17:27:07 server kernel: [ 627.073196] [<00276764>] ? > Dec 15 17:27:07 server kernel: [ 627.073199] [<00205127>] ? > Dec 15 17:27:07 server kernel: [ 627.073201] [<0027b7c2>] ? > Dec 15 17:27:07 server kernel: [ 627.073204] [<00276c28>] ? > Dec 15 17:27:07 server kernel: [ 627.073207] [<0009f0ad>] ? > Dec 15 17:27:07 server kernel: [ 627.073210] [<0009f197>] ? > Dec 15 17:27:07 server kernel: [ 627.073212] [<0016ec3e>] ? > Dec 15 17:27:07 server kernel: [ 627.073215] [<000c2a97>] ? > Dec 15 17:27:07 server kernel: [ 627.073218] [<00174e66>] ? > Dec 15 17:27:07 server kernel: [ 627.073221] [<00007e7f>] ? > Dec 15 17:27:07 server kernel: [ 627.073223] [<001d41ed>] ? > Dec 15 17:27:07 server kernel: [ 627.073227] [<00007fc2>] ? > Dec 15 17:27:07 server kernel: [ 627.073229] [<0001651a>] ? > Dec 15 17:27:07 server kernel: [ 627.073232] [<002c0000>] ? > Dec 15 17:27:07 server kernel: [ 627.073236] [<000b8e5b>] ? > Dec 15 17:27:07 server kernel: [ 627.073238] [<0007f6af>] ? > Dec 15 17:27:07 server kernel: [ 627.073241] [<000cabf7>] ? > Dec 15 17:27:07 server kernel: [ 627.073244] [<000cafd7>] ? > Dec 15 17:27:07 server kernel: [ 627.073247] [<000cac99>] ? > Dec 15 17:27:07 server kernel: [ 627.073249] [<000cafd7>] ? > Dec 15 17:27:07 server kernel: [ 627.073252] [<000cb2bd>] ? > Dec 15 17:27:07 server kernel: [ 627.073255] [<000cb396>] ? > Dec 15 17:27:07 server kernel: [ 627.073257] [<000cbd49>] ? > Dec 15 17:27:07 server kernel: [ 627.073260] [<0005124e>] ? > Dec 15 17:27:07 server kernel: [ 627.073263] [<00050f04>] ? > Dec 15 17:27:07 server kernel: [ 627.073265] [<0005124e>] ? > Dec 15 17:27:07 server kernel: [ 627.073268] [<000c3f8b>] ? > Dec 15 17:27:07 server kernel: [ 627.073270] [<0007f698>] ? > Dec 15 17:27:07 server kernel: [ 627.073273] [<000c3ff8>] ? > Dec 15 17:27:07 server kernel: [ 627.073275] [<0000466f>] ? > Dec 15 17:27:07 server kernel: [ 627.073280] ---[ end trace > 927e9ac79397ac32 ]--- > Dec 15 17:27:07 server kernel: [ 627.073284] kobject_add_internal > failed for 0:209 with -EEXIST, don't try to register things with the > same name in the same directory. > > all with variations on "sysfs: cannot create duplicate filename > '/class/bdi/0:209'",e.g.: > > sysfs: cannot create duplicate filename '/class/bdi/0:209' > sysfs: cannot create duplicate filename '/class/bdi/0:209' > sysfs: cannot create duplicate filename '/class/bdi/0:198' > sysfs: cannot create duplicate filename '/class/bdi/0:198' > sysfs: cannot create duplicate filename '/class/bdi/0:196' > sysfs: cannot create duplicate filename '/class/bdi/0:196' > > I've also seen this in the logs, but caching seems to continue: NFS: > Cache request denied due to non-unique superblock keys > > What sort of debugging info would it be helpful for us to gather? NFS > caching in the kernel is like a dream come true for me, so I'm happy > to help in info gathering and trying out various settings, etc. > Just to update, this still occurs on 2.6.32.8, using Debian Lenny, 32-bit, with the cachefilesd from Testing. As you can see, it ran (with pretty heavy usage) for 2.5 hours and I can verify that there's quite a bit in the cache (about 640meg). The only things logged were: Feb 18 16:05:55 server kernel: [ 8951.401790] CacheFiles: I/O Error: Unlink failed Feb 18 16:05:55 server kernel: [ 8951.401795] FS-Cache: Cache cachefiles stopped due to I/O error Anything I could do to debug this further? Would any output generated by turning on /sys/module/fscache/parameters/debug help and if so, what flag(s) would be used? Turning them all on generates a tidal wave of data :) -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs