Re: cachefiles bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Still having the same issues here on Gentoo 2.6.33 - have been having the
exact same issue since 2.6.30 bugs were squashed.

Runs fine for upto 3-4hrs, then under load it just dies.

Greg

-----Original Message-----
From: linux-cachefs-bounces@xxxxxxxxxx
[mailto:linux-cachefs-bounces@xxxxxxxxxx] On Behalf Of Romain DEGEZ
Sent: Tuesday, 30 March 2010 12:34 AM
To: linux-cachefs@xxxxxxxxxx
Subject:  cachefiles bug

Dear David,

First of all, thanks for your work. 
It looks very promising as we were missing such a nice functionality in the
kernel for so long!

In a production setup 4 servers with 16Gig of ram and dual quad-core xeon
L5410 processors, running a 2.6.33-2-amd64 debian kernel.

These servers are used to send files over http (using apache or lighttpd).

These files are all located on a remote nfs server and localy-cached thanks
to fs-cache and cachefilesd on a local 2 disk raid1 array with a 250gig ext4
filesystem mounted in /var/cache/fscache.

The nfs filesystem is mounted that way:
x.x.x.x:/data on /data type nfs (ro,noatime,tcp,soft,fsc,addr=x.x.x.x)

cachefilesd.conf is :

dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%

#cat /proc/fs/fscache/stats

FS-Cache statistics
Cookies: idx=3 dat=2880 spc=0
Objects: alc=2484 nal=0 avl=2484 ded=2462 ChkAux : non=0 ok=2131 upd=0
obs=70 Pages  : mrk=15802814 unc=14993041
Acquire: n=2883 nul=0 noc=252 ok=2631 nbf=252 oom=0
Lookups: n=2484 neg=343 pos=2141 crt=0 tmo=343
Updates: n=0 nul=0 run=0
Relinqs: n=1721 nul=0 wcr=0 rtr=20
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=14741 ok=5400 wt=452 nod=693 nbf=8648 int=0 oom=0
Retrvls: ops=6093 owt=112 abt=0
Stores : n=1972991 ok=1972776 agn=0 nbf=215 oom=0 Stores : ops=999
run=1965351 pgs=1964352 rxd=1972776 olm=0 VmScan : nos=14959114 gon=0 bsy=10
can=8424
Ops    : pend=112 run=7092 enq=16438335 can=0 rej=0
Ops    : dfr=0 rel=7092 gc=0
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0


And we are seeing a lot of these errors in on all our servers dmesg:

[ 4868.465413] CacheFiles: I/O Error: Unlink failed [ 4868.465444] FS-Cache:
Cache cachefiles stopped due to I/O error [ 4947.320011] CacheFiles: File
cache on md3 unregistering [ 4947.320041] FS-Cache: Withdrawing cache
"mycache"
[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles) [
5127.348716] CacheFiles: File cache on md3 registered [ 7076.871081]
CacheFiles: I/O Error: Unlink failed [ 7076.871130] FS-Cache: Cache
cachefiles stopped due to I/O error [ 7116.780891] CacheFiles: File cache on
md3 unregistering [ 7116.780937] FS-Cache: Withdrawing cache "mycache"
[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles) [
7296.813432] CacheFiles: File cache on md3 registered

It is very painfull as it render the cache useless ....

When looking at the source-code, the cause of the "I/O Error: Unlink failed"

which seems to happen somewhere after the "bury_something" function is
called looked pretty obscure to me...

I don't see why any unlink would fail....

I am monitoring this list for some time and tried all the various patches
without success...

Could you please give me a hand to troubleshot this issue ?

Regards,

--
RD

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs

--
Linux-cachefs mailing list
Linux-cachefs@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cachefs

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]
  Powered by Linux