On Wed, May 12, 2010 at 2:28 PM, Mark Moseley <moseleymark@xxxxxxxxx> wrote: > I've been running cachefilesd 0.10.1 since yesterday on this box and > got this (attached) BUG traceback. System was unresponsive after that. > Kernel is 2.6.33.3 with the suite of patches that David H put out the > other day in the thread "Possible patch for CacheFiles: I/O Error: > Unlink failed" (I actually applied the broken-out patches repackaged > by Romain DEGEZ). Without the patches, cachefilesd dies after about 45 > minutes with the "Unlink failed" error. With the patches, it's run all > the way since yesterday afternoon before dying a few minutes ago with > this error (that I've not seen before). The system is a Dell Poweredge > 1950, running Debian Lenny 32-bit, with a fairly NFS-intensive > workload. I don't have the exact disk usage from right before it died > but a 'df' approx 30 mins earlier showed that it had a little shy of 9 > gig used in the cache (with 58g free). I didn't do a df -i any time > recently on it, so I don't know how many entries were in there but the > vast majority is html and image files, so probably averaging in the > 1-100k range, so quite a few entries. > > A few hours ago I happened to look at the /sys stats (but not since, > so this is probably a few hours prior to BUG): > > # cat /proc/fs/fscache/stats > FS-Cache statistics > Cookies: idx=5436 dat=876190 spc=0 > Objects: alc=796439 nal=0 avl=796402 ded=687802 > ChkAux : non=0 ok=457637 upd=0 obs=572 > Pages : mrk=3678901 unc=3250437 > Acquire: n=881626 nul=0 noc=0 ok=881626 nbf=0 oom=0 > Lookups: n=796793 neg=339151 pos=457288 crt=339151 tmo=354 > Updates: n=0 nul=0 run=0 > Relinqs: n=772550 nul=0 wcr=37 rtr=14494 > AttrChg: n=0 ok=0 nbf=0 oom=0 run=0 > Allocs : n=0 ok=0 wt=0 nbf=0 int=0 > Allocs : ops=0 owt=0 abt=0 > Retrvls: n=1008965 ok=541779 wt=229502 nod=376000 nbf=91186 int=0 oom=0 > Retrvls: ops=917779 owt=232622 abt=0 > Stores : n=1547630 ok=1547630 agn=0 nbf=0 oom=0 > Stores : ops=387729 run=1935352 pgs=1547623 rxd=1547630 olm=0 > VmScan : nos=3218178 gon=0 bsy=4 can=7 > Ops : pend=232917 run=1305508 enq=4102804 can=0 rej=0 > Ops : dfr=903 rel=1305508 gc=903 > CacheOp: alo=0 luo=0 luc=0 gro=0 > CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0 > CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0 > > > # cat /proc/fs/cachefiles/histogram > JIFS SECS LOOKUPS MKDIRS CREATES > ===== ===== ========= ========= ========= > 0 0.000 1694602 7529 336885 > 1 0.010 4126 50 1758 > 2 0.020 201 4 139 > 3 0.030 46 1 42 > 4 0.040 21 1 26 > 5 0.050 22 0 21 > 6 0.060 13 0 24 > 7 0.070 4 0 15 > 8 0.080 14 0 21 > 9 0.090 11 0 16 > 10 0.100 8 0 14 > 11 0.110 7 0 13 > 12 0.120 6 0 13 > 13 0.130 13 0 8 > 14 0.140 10 1 16 > 15 0.150 6 1 9 > 16 0.160 5 0 7 > 17 0.170 3 2 12 > 18 0.180 3 0 8 > 19 0.190 3 0 5 > 20 0.200 4 0 9 > 21 0.210 5 0 3 > 22 0.220 3 0 11 > 23 0.230 0 0 6 > 24 0.240 1 0 7 > 25 0.250 1 0 9 > 26 0.260 4 0 7 > 27 0.270 1 0 4 > 28 0.280 1 0 6 > 29 0.290 4 0 1 > 30 0.300 1 0 7 > 31 0.310 1 0 5 > 32 0.320 2 0 5 > 33 0.330 2 0 3 > 34 0.340 1 1 2 > 35 0.350 0 0 6 > 36 0.360 0 0 1 > 37 0.370 0 1 4 > 38 0.380 0 0 6 > 39 0.390 0 0 1 > 40 0.400 0 0 5 > 41 0.410 2 0 4 > 42 0.420 0 0 4 > 43 0.430 0 0 5 > 44 0.440 1 0 4 > 45 0.450 1 0 1 > 46 0.460 1 0 1 > 47 0.470 0 0 2 > 48 0.480 0 0 2 > 49 0.490 0 0 2 > 51 0.510 0 0 3 > 52 0.520 1 0 3 > 53 0.530 0 0 3 > 54 0.540 0 0 2 > 55 0.550 0 0 1 > 56 0.560 0 0 2 > 57 0.570 0 0 1 > 58 0.580 0 0 2 > 59 0.590 1 0 2 > 60 0.600 0 0 2 > 61 0.610 0 0 1 > 62 0.620 1 0 1 > 66 0.660 0 0 1 > 69 0.690 0 0 1 > 71 0.710 0 1 0 > 72 0.720 0 0 1 > 73 0.730 0 0 1 > 74 0.740 0 0 1 > 76 0.760 0 0 2 > 78 0.780 0 0 1 > 81 0.810 0 0 2 > 82 0.820 0 0 2 > 83 0.830 0 0 1 > 89 0.890 0 0 1 > 99 0.990 0 0 6 > > I'll be happy to try anything out, either patch-wise or research-wise. thx > Anybody else seen this error before? The comments in the code say: /* an old object from a previous incarnation is hogging the slot - we * need to wait for it to be destroyed */ If it's an object hanging around since a previous incarnation, does that mean that it's better to wipe the cache/ directory at each startup? -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs