We have had a very heavily loaded web server which delivers mp3 files through apache with the files residing on an NFS file system. The server receives around 4-10 requests a second, and the files are each around 600KB in size. The server delivers a throughput of between 20Mb/s - 60Mb/s depending on the time of day. The NFS file system being used is read only. The load on this server was increasing to over 100 at busy times during the day, I thought, due to high wait times from the NFS server As the load was so high, the server was becoming unresponsive so we have implemented fscache with local ssd drives to alleviate this on a new server. The fs cache implementation has been somewhat successful, load figures on the new server are now down in the rang of 0-30 which is much better, and looking at throughput on the network and the fscache stats, I can see that around 75% of requests are now satisfied by fscache. See fscache stats below, which I hope I am interpreting correctly. Despite this, files which are definitely cached by fscache still can take over 10 seconds to deliver to our monitoring server connected via a 100Mbit switch, but sometime take less than a second, and I am not sure why. Our monitoring server every minute asks for the same file to be delivered to it, hence I know its cached by fscache. I have spent a lot of time checking out articles on NFS client performance and have made a number of changes to kernel networking and sunrpc parameters (see below also) in an attempt to resolve the issue which have definitely helped. Can anyone shed any light on why a file cached by fscache might take so long to be delivered. A limitation of nfs perhaps or a problem with fscache? I am not sure where best to start to debug the problem. Regards Ben FS-Cache statistics Cookies: idx=6 dat=145221 spc=0 Objects: alc=145226 nal=0 avl=145226 ded=143437 ChkAux : non=0 ok=102608 upd=0 obs=2 Pages : mrk=31558074 unc=31364899 Acquire: n=145227 nul=0 noc=0 ok=145227 nbf=0 oom=0 Lookups: n=145226 neg=42614 pos=102612 crt=42614 tmo=0 Updates: n=0 nul=0 run=0 Relinqs: n=143439 nul=0 wcr=0 rtr=0 AttrChg: n=0 ok=0 nbf=0 oom=0 run=0 Allocs : n=0 ok=0 wt=0 nbf=0 int=0 Allocs : ops=0 owt=0 abt=0 Retrvls: n=200055 ok=157332 wt=28965 nod=42723 nbf=0 int=0 oom=0 Retrvls: ops=200055 owt=13453 abt=0 Stores : n=7141275 ok=7141275 agn=0 nbf=0 oom=0 Stores : ops=903546 run=8044821 pgs=7141275 rxd=7141275 olm=0 VmScan : nos=31171518 gon=0 bsy=0 can=0 Ops : pend=13454 run=1103601 enq=39680550 can=0 rej=0 Ops : dfr=242 rel=1103601 gc=242 CacheOp: alo=0 luo=0 luc=0 gro=0 CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0 CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0 Mountstats: Stats for 10.0.20.192:/live_clips mounted on /mnt/clips: NFS mount options: ro,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=3,acregmax=60,acdirmin =30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.20 .192,mountvers=3,mountport=1234,mountproto=tcp,fsc,local_lock=none NFS server capabilities: caps=0x3fc7,wtmult=512,dtsize=8192,bsize=0,namlen=255 NFS security flavor: 1 pseudoflavor: 0 NFS byte counts: applications read 2517191085 bytes via read(2) applications wrote 0 bytes via write(2) applications read 0 bytes via O_DIRECT read(2) applications wrote 0 bytes via O_DIRECT write(2) client read 28169649739 bytes via NFS READ client wrote 0 bytes via NFS WRITE RPC statistics: 1795475 RPC requests sent, 1795474 RPC replies received (0 XIDs not found) average backlog queue length: 0 GETATTR: 582165 ops (32%) 4 retrans (0%) 0 major timeouts avg bytes sent per op: 124 avg bytes received per op: 112 backlog wait: 0.006519 RTT: 0.351234 total execute time: 0.378106 (milliseconds) LOOKUP: 154984 ops (8%) 0 retrans (0%) 0 major timeouts avg bytes sent per op: 151 avg bytes received per op: 238 backlog wait: 0.006775 RTT: 262.135511 total execute time: 262.158558 (milliseconds) ACCESS: 176336 ops (9%) 1 retrans (0%) 0 major timeouts avg bytes sent per op: 128 avg bytes received per op: 120 backlog wait: 0.004304 RTT: 1.463734 total execute time: 1.482204 (milliseconds) READ: 881948 ops (49%) 0 retrans (0%) 0 major timeouts avg bytes sent per op: 136 avg bytes received per op: 32068 backlog wait: 0.175209 RTT: 40.025127 total execute time: 40.232458 (milliseconds) FSINFO: 2 ops (0%) 0 retrans (0%) 0 major timeouts avg bytes sent per op: 136 avg bytes received per op: 164 backlog wait: 0.000000 RTT: 0.000000 total execute time: 0.000000 (milliseconds) PATHCONF: 1 ops (0%) 0 retrans (0%) 0 major timeouts avg bytes sent per op: 136 avg bytes received per op: 140 backlog wait: 0.000000 RTT: 0.000000 total execute time: 0.000000 (milliseconds) [root@jrclips ~]# yum list cachefilesd Installed Packages cachefilesd.x86_64 0.10.1-2.el6 [root@jrclips ~]# uname -a Linux jrclips 2.6.32-131.17.1.el6.x86_64 #1 SMP Wed Oct 5 17:19:54 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux Changes to sysctl.conf #Mkae sure this is used before any nfs file systems are mounted sunrpc.tcp_slot_table_entries = 128 # These ensure that TIME_WAIT ports either get reused or closed fast. net.ipv4.tcp_fin_timeout = 1 net.ipv4.tcp_tw_recycle = 1 # TCP memory net.core.rmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_max = 16777216 net.core.wmem_default = 16777216 net.core.netdev_max_backlog = 262144 net.core.somaxconn = 262144 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_max_syn_backlog = 262144 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2 net.ipv4.tcp_rmem = 4096 262144 16777216 net.ipv4.tcp_wmem = 4096 262144 16777216 [root@jrclips ~]# free total used free shared buffers cached Mem: 2053992 1989756 64236 0 124620 1677612 -/+ buffers/cache: 187524 1866468 Swap: 4128760 0 4128760 -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs