On Fri, 2 Sep 2011, David Howells wrote: > Mark Hills <mark@xxxxxxxxxxx> wrote: > > > In dmesg "Overlong wait for old active object to go away" (see full > > message below), and an application will hang or lock, eg. blocking on > > read(). > > > > Sometimes it unblocks, sometimes it seems to hang indefinitely. Is there a > > higher level explanation for what is going on here? > > I suspect that the OBJECT_RECYCLING state doesn't complete the > transition to the OBJECT_DYING state since we saw that we had to wait > for outstanding operations to complete at the time OBJECT_DYING was > processed. I think there's probably a race in waking up the object > state machine when the last operation completes. > > It would be useful you could try is grepping in /proc/fs/fscache/objects > for the objects named in the error text. The object debug ID is > right-aligned in the first column without the "OBJ" attached. Below is the relevant entries from the object list, during the hang. The RCYC entry also correlates with the hang, and the LOOK entry. I also took the corresponding stats (during the hang) and the dmesg around the error message, which I took with: # echo $((1|2|4|8|16|32|512|1024|2048)) > /sys/module/fscache/parameters/debug If I find some time then perhaps I will try and understand the meaning of some of this information! But for now I will post it here in the hope that it is useful -- perhaps there is a clear indication of a problem, somewhere. Thanks -- Mark $ uname -a Linux joker 2.6.39.4-mh #34 SMP Mon Sep 5 13:40:21 BST 2011 x86_64 x86_64 x86_64 GNU/Linux $ cat /proc/fs/fscache/objects | grep -v ACTV OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA ======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================ 63a 3 RCYC 0 0 0 0 0 0 3 20 0 2 | <no_cookie> 63b 3 LOOK 0 0 0 0 0 0 7b 0 0 2 | NFS.fh DT 7 ffff880226a60420 010006015cd1795b8c9f417c8beaffe8b6b51c538c33cc0a424c3793, eef90a4e00000000dae96d3800000000eef90a4e00000000dae96d380000000069000000000000000000000000000000 $ cat /proc/fs/fscache/stats FS-Cache statistics Cookies: idx=8 dat=1606 spc=0 Objects: alc=1595 nal=0 avl=1593 ded=881 ChkAux : non=0 ok=620 upd=0 obs=0 Pages : mrk=15345 unc=3064 Acquire: n=1614 nul=0 noc=0 ok=1614 nbf=0 oom=0 Lookups: n=1603 neg=981 pos=613 crt=981 tmo=8 Updates: n=0 nul=0 run=0 Relinqs: n=884 nul=0 wcr=1 rtr=870 AttrChg: n=0 ok=0 nbf=0 oom=0 run=0 Allocs : n=0 ok=0 wt=0 nbf=0 int=0 Allocs : ops=0 owt=0 abt=0 Retrvls: n=1003 ok=617 wt=865 nod=350 nbf=35 int=0 oom=0 Retrvls: ops=967 owt=287 abt=0 Stores : n=8112 ok=8112 agn=0 nbf=0 oom=0 Stores : ops=385 run=8497 pgs=8112 rxd=8112 olm=0 VmScan : nos=1709 gon=0 bsy=0 can=0 Ops : pend=287 run=1352 enq=16432 can=0 rej=0 Ops : dfr=13 rel=1352 gc=13 CacheOp: alo=0 luo=1 luc=0 gro=0 CacheOp: upo=0 dro=1 pto=0 atc=0 syn=0 CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0 $ dmesg <cut> [ 2383.150006] [opera ] ==> __fscache_relinquish_cookie(ffff88021ad44f00{NFS.fh,ffff880226a60420},1) [ 2383.150012] [opera ] RELEASE OBJ63a [ 2383.150016] [opera ] ==> fscache_enqueue_object({OBJ63a}) [ 2383.150032] [kworke] ==> fscache_object_work_func({OBJ63a}) [ 2383.150036] [kworke] ==> fscache_object_state_machine({OBJ63a,OBJECT_ACTIVE,24}) [ 2383.150041] [opera ] ==> __fscache_cookie_put(ffff88021ad44f00) [ 2383.150045] [opera ] FREE COOKIE ffff88021ad44f00 [ 2383.150049] [opera ] <== __fscache_cookie_put() [ 2383.150052] [opera ] <== __fscache_relinquish_cookie() [ 2383.150057] [opera ] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff880226a60420) [ 2383.150061] [opera ] ==> fscache_acquire_non_index_cookie() [ 2383.150065] [opera ] ==> fscache_select_cache_for_object() [ 2383.150069] [opera ] <== fscache_select_cache_for_object() = ffff880229124c00 [parent] [ 2383.150073] [opera ] cache mycache [ 2383.150077] [opera ] ==> fscache_alloc_object(ffff880229124c00,ffff88021ad44f00{NFS.fh}) [ 2383.150082] [kworke] <== fscache_object_state_machine() [->OBJECT_DYING] [ 2383.150086] [opera ] ALLOC OBJ63b: NFS.fh {0} [ 2383.150091] [opera ] ==> fscache_alloc_object(ffff880229124c00,ffff88022910d000{NFS.server}) [ 2383.150105] [opera ] <== fscache_alloc_object() = 0 [found] [ 2383.150109] [opera ] ==> fscache_attach_object({NFS.fh},{OBJ63b}) [ 2383.150113] [kworke] ==> fscache_enqueue_object({OBJ63a}) [ 2383.150117] [opera ] <== fscache_attach_object() = 0 [ 2383.150120] [opera ] <== fscache_alloc_object() = 0 [ 2383.150123] [opera ] ==> fscache_enqueue_object({OBJ63b}) [ 2383.150132] [opera ] <== fscache_acquire_non_index_cookie() = 0 [deferred] [ 2383.150136] [kworke] ==> fscache_object_work_func({OBJ63b}) [ 2383.150140] [kworke] ==> fscache_object_state_machine({OBJ63b,OBJECT_INIT,0}) [ 2383.150144] [kworke] ==> fscache_initialise_object() [ 2383.150149] [kworke] parent OBJECT_ACTIVE [ 2383.150152] [opera ] <== __fscache_acquire_cookie() = ffff88021ad44f00 [ 2383.150156] [kworke] go [ 2383.150159] [kworke] <== fscache_initialise_object() [ 2383.150163] [kworke] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP] [ 2383.150166] [kworke] ==> fscache_enqueue_object({OBJ63b}) [ 2383.150205] [kworke] ==> fscache_object_work_func({OBJ63a}) [ 2383.150209] [kworke] ==> fscache_object_work_func({OBJ63b}) [ 2383.150213] [kworke] ==> fscache_object_state_machine({OBJ63b,OBJECT_LOOKING_UP,0}) [ 2383.150217] [kworke] ==> fscache_lookup_object() [ 2383.150221] [kworke] LOOKUP "NFS.server/NFS.fh" in "mycache" [ 2383.150227] [kworke] ==> fscache_object_state_machine({OBJ63a,OBJECT_DYING,24}) [ 2383.150233] [kworke] dying OBJ63a {0,0} [ 2383.150238] [kworke] ==> fscache_enqueue_dependents({OBJ63a}) [ 2383.150243] [kworke] woke 0 ops on OBJ63a [ 2383.150248] [kworke] <== fscache_object_state_machine() [->OBJECT_RECYCLING] [ 2383.150253] [kworke] ==> fscache_enqueue_object({OBJ63a}) [ 2383.150261] [kworke] ==> fscache_object_work_func({OBJ63a}) [ 2383.150266] [kworke] ==> fscache_object_state_machine({OBJ63a,OBJECT_RECYCLING,20}) [ 2383.150271] [kworke] ==> fscache_release_object() [ 2383.150276] [kworke] ==> fscache_drop_object({OBJ63a,0}) [ 2388.575662] [cat ] ==> fscache_fsdef_netfs_get_aux({nfs.0},) [ 2391.559612] [cat ] ==> fscache_fsdef_netfs_get_key({nfs.0},) [ 2443.150062] [ 2443.150066] CacheFiles: Error: Overlong wait for old active object to go away [ 2443.150071] object: OBJ63b [ 2443.150075] objstate=OBJECT_LOOKING_UP fl=0 wbusy=2 ev=0[7b] [ 2443.150078] ops=0 inp=0 exc=0 [ 2443.150080] parent=ffff8802213f0480 [ 2443.150084] cookie=ffff88021ad44f00 [pr=ffff88022910d000 nd=ffff880226a60420 fl=7] [ 2443.150087] key=[28] '010006015cd1795b8c9f417c8beaffe8b6b51c538c33cc0a424c3793' [ 2443.150102] xobject: OBJ63a [ 2443.150106] xobjstate=OBJECT_RECYCLING fl=0 wbusy=2 ev=20[3] [ 2443.150108] xops=0 inp=0 exc=0 [ 2443.150110] xparent=ffff8802213f0480 [ 2443.150112] xcookie=NULL [ 2443.150122] [kworke] <== fscache_lookup_object() [ 2443.150125] [kworke] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP] [ 2443.150129] [kworke] ==> fscache_enqueue_object({OBJ63b}) [ 2443.150134] [kworke] ==> fscache_object_work_func({OBJ63b}) [ 2443.150138] [kworke] ==> fscache_object_state_machine({OBJ63b,OBJECT_LOOKING_UP,0}) [ 2443.150141] [kworke] ==> fscache_lookup_object() [ 2443.150145] [kworke] LOOKUP "NFS.server/NFS.fh" in "mycache" [ 2443.150215] [kworke] ==> fscache_object_lookup_negative({OBJ63b,OBJECT_LOOKING_UP}) [ 2443.150221] [kworke] release parent OBJ3 {616} [ 2443.150226] [kworke] wake up lookup ffff88021ad44f40 [ 2443.150229] [kworke] <== fscache_drop_object() [ 2443.150234] [kworke] <== fscache_object_state_machine() [->OBJECT_DEAD] [ 2443.150240] [kworke] <== fscache_object_lookup_negative() [ 2443.150250] [opera ] ==> fscache_submit_op({OBJ63b OP24f9},{1}) [ 2443.150369] [kworke] ==> fscache_obtained_object({OBJ63b,OBJECT_CREATING}) [ 2443.150372] [kworke] <== fscache_obtained_object() [ 2443.150396] [kworke] <== fscache_lookup_object() [ 2443.150399] [kworke] <== fscache_object_state_machine() [->OBJECT_AVAILABLE] [ 2443.150402] [kworke] ==> fscache_enqueue_object({OBJ63b}) [ 2443.150410] [kworke] ==> fscache_object_work_func({OBJ63b}) [ 2443.150413] [kworke] ==> fscache_object_state_machine({OBJ63b,OBJECT_AVAILABLE,0}) [ 2443.150416] [kworke] ==> fscache_object_available({OBJ63b}) [ 2443.150420] [kworke] ==> fscache_done_parent_op(OBJ63b {OBJ3,1}) [ 2443.150430] [kworke] ==> fscache_put_operation({OBJ63b OP24f9,2}) [ 2443.150433] [kworke] woke 1 ops on OBJ63b [ 2443.150437] [kworke] ==> fscache_enqueue_dependents({OBJ63b}) [ 2443.150441] [opera ] ==> fscache_put_operation({OBJ63b OP24f9,1}) [ 2443.150445] [opera ] PUT OP [ 2443.150449] [opera ] woke 0 ops on OBJ63b [ 2443.150452] [opera ] <== fscache_put_operation() [done] [ 2443.150515] [kworke] <== fscache_object_available() [ 2443.150518] [kworke] <== fscache_object_state_machine() [->OBJECT_ACTIVE] [ 2443.150934] [kworke] ==> fscache_submit_op({OBJ63b OP24fb},{1}) [ 2443.150938] [kworke] ==> fscache_enqueue_operation({OBJ63b OP24fb,1}) [ 2443.150941] [kworke] queue async [ 2443.150948] [kworke] ==> fscache_put_operation({OBJ63b OP24fb,2}) [ 2443.150953] [kworke] ==> fscache_op_work_func({OBJ63b OP24fb,1}) [ 2443.150982] [kworke] ==> fscache_enqueue_operation({OBJ63b OP24fb,1}) [ 2443.150985] [kworke] queue async [ 2443.150994] [kworke] ==> fscache_put_operation({OBJ63b OP24fb,2}) [ 2443.150997] [kworke] <== fscache_op_work_func() [ 2443.151001] [kworke] ==> fscache_op_work_func({OBJ63b OP24fb,1}) [ 2443.151014] [kworke] ==> fscache_put_operation({OBJ63b OP24fb,1}) [ 2443.151018] [kworke] PUT OP [ 2443.151021] [kworke] woke 0 ops on OBJ63b [ 2443.151025] [kworke] <== fscache_put_operation() [done] [ 2443.151028] [kworke] <== fscache_op_work_func() -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs