On Tue, 2010-11-23 at 15:42 -0700, Gregory Farnum wrote: > Jim: > I've managed to reproduce this (or a similar problem) locally, > although I am getting core files. If you aren't I suspect your ulimit > has been reset or isn't high enough. Doh!! You're right, for some reason my root ulimit -c was zero. Now it's not ;) > > This is an error I introduced when making some changes to how we do > trimming. I forgot to account for how the root inode is different, so > all MDS instances that weren't the auth on the root inode should have > crashed during the resolve phase. (I'm not sure why you were only > having 5/7 crash before, it might be that two MDSes were sharing auth > or that one of the MDSes wasn't moving through the phases as quickly > as the others.) > I've pushed a fix to the testing branch in commit > d8652de61647ae19ad0f3ec90fad00930cdd5afd; it should cherry-pick to any > recent-ish unstable just fine. :) I pulled current testing into current unstable, the result works great on this test :) Thanks for the quick turnaround! -- Jim > -Greg > > On Tue, Nov 23, 2010 at 1:50 PM, Jim Schutt <jaschut@xxxxxxxxxx> wrote: > > > > Hi, > > > > I've been working with a file system with 7 mon instances, > > and 7 mds instances, on current unstable (fc212548aea1). > > > > I start it up, let it stabilize (all the osds get to > > the same epoch), shut it down, then restart it. > > > > Not very long after the restart, 6 of the 7 mds instances > > disappear, with no core files and no stack trace in the logs. > > > > The last few lines of output from ceph -w look like this: > > > > 2010-11-23 14:37:20.504569 pg v88: 3432 pgs: 3432 active; 138 KB data, 177 MB used, 3032 GB / 3032 GB avail; 12/234 degraded (5.128%) > > 2010-11-23 14:37:20.739372 pg v89: 3432 pgs: 3432 active; 138 KB data, 167 MB used, 3032 GB / 3032 GB avail; 12/234 degraded (5.128%) > > 2010-11-23 14:37:21.033669 pg v90: 3432 pgs: 3432 active; 138 KB data, 154 MB used, 3032 GB / 3032 GB avail; 12/234 degraded (5.128%) > > 2010-11-23 14:37:21.353834 pg v91: 3432 pgs: 3432 active; 138 KB data, 139 MB used, 3032 GB / 3032 GB avail; 12/234 degraded (5.128%) > > 2010-11-23 14:37:21.478496 mds e33: 7/7/7 up {0=up:replay,1=up:replay,2=up:resolve,3=up:replay,4=up:resolve,5=up:resolve,6=up:replay} > > 2010-11-23 14:37:21.709432 log 2010-11-23 14:37:21.403836 mon0 172.17.40.34:6789/0 25 : [INF] mds2 172.17.40.34:6800/7571 up:resolve > > 2010-11-23 14:37:21.827792 pg v92: 3432 pgs: 3432 active; 138 KB data, 134 MB used, 3032 GB / 3032 GB avail; 10/234 degraded (4.274%) > > 2010-11-23 14:37:22.197558 pg v93: 3432 pgs: 3432 active; 139 KB data, 128 MB used, 3032 GB / 3032 GB avail; 10/234 degraded (4.274%) > > 2010-11-23 14:37:22.484411 pg v94: 3432 pgs: 3432 active; 139 KB data, 116 MB used, 3032 GB / 3032 GB avail; 10/234 degraded (4.274%) > > 2010-11-23 14:37:22.766665 pg v95: 3432 pgs: 3432 active; 139 KB data, 102316 KB used, 3032 GB / 3032 GB avail; 10/234 degraded (4.274%) > > 2010-11-23 14:37:25.261067 mds e34: 7/7/7 up {0=up:resolve,1=up:replay,2=up:resolve,3=up:replay,4=up:resolve,5=up:resolve,6=up:replay} > > 2010-11-23 14:37:25.398455 log 2010-11-23 14:37:25.236187 mon0 172.17.40.34:6789/0 26 : [INF] mds0 172.17.40.40:6800/7567 up:resolve > > 2010-11-23 14:37:25.592960 mds e35: 7/7/7 up {0=up:resolve,1=up:replay,2=up:resolve,3=up:replay,4=up:resolve,5=up:resolve,6=up:resolve} > > 2010-11-23 14:37:25.774941 log 2010-11-23 14:37:25.603505 mon0 172.17.40.34:6789/0 27 : [INF] mds6 172.17.40.35:6800/7935 up:resolve > > 2010-11-23 14:37:29.273125 mds e36: 7/7/7 up {0=up:resolve,1=up:replay,2=up:resolve,3=up:resolve,4=up:resolve,5=up:resolve,6=up:resolve} > > 2010-11-23 14:37:29.410528 log 2010-11-23 14:37:29.247681 mon0 172.17.40.34:6789/0 28 : [INF] mds3 172.17.40.37:6800/996 up:resolve > > 2010-11-23 14:37:29.612511 mds e37: 7/7/7 up {0=up:resolve,1=up:resolve,2=up:resolve,3=up:resolve,4=up:resolve,5=up:resolve,6=up:resolve} > > 2010-11-23 14:37:29.799629 mds e38: 7/7/7 up {0=up:reconnect,1=up:resolve,2=up:resolve,3=up:resolve,4=up:resolve,5=up:resolve,6=up:resolve} > > 2010-11-23 14:37:29.824094 log 2010-11-23 14:37:29.618299 mon0 172.17.40.34:6789/0 29 : [INF] mds1 172.17.40.39:6800/8550 up:resolve > > 2010-11-23 14:37:30.119613 log 2010-11-23 14:37:29.760648 mon0 172.17.40.34:6789/0 30 : [INF] mds0 172.17.40.40:6800/7567 up:reconnect > > 2010-11-23 14:37:30.216568 mds e39: 7/7/7 up {0=up:rejoin,1=up:resolve,2=up:resolve,3=up:resolve,4=up:resolve,5=up:resolve,6=up:resolve} > > 2010-11-23 14:37:30.434163 log 2010-11-23 14:37:30.208227 mon0 172.17.40.34:6789/0 31 : [INF] mds0 172.17.40.40:6800/7567 up:rejoin > > 2010-11-23 14:37:46.274801 mds e40: 7/7/7 up {0=up:rejoin,1=up:resolve,2=up:resolve(laggy or crashed),3=up:resolve(laggy or crashed),4=up:resolve(laggy or crashed),5=up:resolve,6=up:resolve} > > 2010-11-23 14:37:51.303591 mds e41: 7/7/7 up {0=up:rejoin,1=up:resolve(laggy or crashed),2=up:resolve(laggy or crashed),3=up:resolve(laggy or crashed),4=up:resolve(laggy or crashed),5=up:resolve(laggy or crashed),6=up:resolve(laggy or crashed)} > > > > The last few lines of the disappearing mds instance logs > > look like this; in particular 5 of the 6 that disappeared > > were all doing the "handle resolve from mds1" > > > > 2010-11-23 14:37:29.621030 41b7c940 -- 172.17.40.36:6800/7413 <== mds1 172.17.40.39:6800/8550 2 ==== mds_resolve(1+0 subtrees +0 slave requests) v1 ==== 28+0+0 (1837742466 0 0) 0x1a6eb60 > > 2010-11-23 14:37:29.621046 41b7c940 mds4.cache handle_resolve from mds1 > > 2010-11-23 14:37:29.621055 41b7c940 mds4.cache show_subtrees > > 2010-11-23 14:37:29.621065 41b7c940 mds4.cache |__ 4 auth [dir 104 ~mds4/ [2,head] auth v=1 cv=0/0 dir_auth=4 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1a76f10] > > 2010-11-23 14:37:29.621087 41b7c940 mds4.cache maybe_resolve_finish got all resolves+resolve_acks, done. > > 2010-11-23 14:37:29.621099 41b7c940 mds4.cache disambiguate_imports > > 2010-11-23 14:37:29.621109 41b7c940 mds4.cache show_subtrees > > 2010-11-23 14:37:29.621118 41b7c940 mds4.cache |__ 4 auth [dir 104 ~mds4/ [2,head] auth v=1 cv=0/0 dir_auth=4 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1a76f10] > > 2010-11-23 14:37:29.621136 41b7c940 mds4.cache trim_unlinked_inodes > > 2010-11-23 14:37:29.621147 41b7c940 mds4.cache recalc_auth_bits > > 2010-11-23 14:37:29.621157 41b7c940 mds4.cache subtree auth=1 for [dir 104 ~mds4/ [2,head] auth v=1 cv=0/0 dir_auth=4 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1a76f10] > > 2010-11-23 14:37:29.621172 41b7c940 mds4.cache show_subtrees > > 2010-11-23 14:37:29.621182 41b7c940 mds4.cache |__ 4 auth [dir 104 ~mds4/ [2,head] auth v=1 cv=0/0 dir_auth=4 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1a76f10] > > 2010-11-23 14:37:29.621198 41b7c940 mds4.cache show_cache > > 2010-11-23 14:37:29.621205 41b7c940 mds4.cache unlinked [inode 1 [...2,head] / rep@xxx v1 snaprealm=0x7fa750010d90 f() n() (iversion lock) 0x7fa750015270] > > 2010-11-23 14:37:29.621222 41b7c940 mds4.cache unlinked [inode 104 [...2,head] ~mds4/ auth v1 snaprealm=0x1a6f450 f() n() (iversion lock) | dirfrag 0x7fa750015b00] > > 2010-11-23 14:37:29.621239 41b7c940 mds4.cache dirfrag [dir 104 ~mds4/ [2,head] auth v=1 cv=0/0 dir_auth=4 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1a76f10] > > 2010-11-23 14:37:29.621256 41b7c940 mds4.cache trim_non_auth > > 2010-11-23 14:37:29.621283 41b7c940 mds4.cache ... [inode 1 [...2,head] / rep@xxx v1 snaprealm=0x7fa750010d90 f() n() (iversion lock) 0x7fa750015270] > > > > > > The above behavior is repeatable, except usually it's just > > 5 of 7 mds instances that die, and the last thing in their > > logs is always a handle_resolve from the same peer mds. > > > > What's new in this case is that 6th instance disappearing; > > its log had this to say: > > > > 2010-11-23 14:37:29.621961 42aea940 mds1.cache handle_resolve from mds0 > > 2010-11-23 14:37:29.621975 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.621985 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.622003 42aea940 mds1.cache maybe_resolve_finish still waiting for more resolves, got (0,6), need (0,2,3,4,5,6) > > 2010-11-23 14:37:29.622016 42aea940 -- 172.17.40.39:6800/8550 dispatch_throttle_release 44 to dispatch throttler 156/104857600 > > 2010-11-23 14:37:29.622028 42aea940 -- 172.17.40.39:6800/8550 <== mds2 172.17.40.34:6800/7571 4 ==== mds_resolve(1+0 subtrees +0 slave requests) v1 ==== 28+0+0 (2796914913 0 0) 0x1265200 > > 2010-11-23 14:37:29.622043 42aea940 mds1.cache handle_resolve from mds2 > > 2010-11-23 14:37:29.622052 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.622061 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.622077 42aea940 mds1.cache maybe_resolve_finish still waiting for more resolves, got (0,2,6), need (0,2,3,4,5,6) > > 2010-11-23 14:37:29.622090 42aea940 -- 172.17.40.39:6800/8550 dispatch_throttle_release 28 to dispatch throttler 112/104857600 > > 2010-11-23 14:37:29.622101 42aea940 -- 172.17.40.39:6800/8550 <== mds4 172.17.40.36:6800/7413 3 ==== mds_resolve(1+0 subtrees +0 slave requests) v1 ==== 28+0+0 (891395286 0 0) 0x1265c50 > > 2010-11-23 14:37:29.622116 42aea940 mds1.cache handle_resolve from mds4 > > 2010-11-23 14:37:29.622124 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.622134 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.622150 42aea940 mds1.cache maybe_resolve_finish still waiting for more resolves, got (0,2,4,6), need (0,2,3,4,5,6) > > 2010-11-23 14:37:29.622177 42aea940 -- 172.17.40.39:6800/8550 dispatch_throttle_release 28 to dispatch throttler 84/104857600 > > 2010-11-23 14:37:29.622207 42aea940 -- 172.17.40.39:6800/8550 <== mds5 172.17.40.38:6800/7395 3 ==== mds_resolve(1+0 subtrees +0 slave requests) v1 ==== 28+0+0 (2406375000 0 0) 0x1264150 > > 2010-11-23 14:37:29.622226 42aea940 mds1.cache handle_resolve from mds5 > > 2010-11-23 14:37:29.622238 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.622250 411d3940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=1 l=0).reader couldn't read tag, Success > > 2010-11-23 14:37:29.622272 411d3940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=1 l=0).fault 0: Success > > 2010-11-23 14:37:29.622290 452f6940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).writer: state = 2 policy.server=0 > > 2010-11-23 14:37:29.622309 452f6940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).write_ack 3 > > 2010-11-23 14:37:29.622326 411d3940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=1 l=0).requeue_sent mds_resolve(1+0 subtrees +0 slave requests) v1 for resend seq 2 (2) > > 2010-11-23 14:37:29.622347 411d3940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=1 l=0).fault initiating reconnect > > 2010-11-23 14:37:29.622364 452f6940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).writer: state = 2 policy.server=0 > > 2010-11-23 14:37:29.622380 411d3940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).reader done > > 2010-11-23 14:37:29.622400 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).writer: state = 1 policy.server=0 > > 2010-11-23 14:37:29.622419 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).connect 2 > > 2010-11-23 14:37:29.622437 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.622461 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).connecting to 172.17.40.34:6800/7571 > > 2010-11-23 14:37:29.622482 42aea940 mds1.cache maybe_resolve_finish still waiting for more resolves, got (0,2,4,5,6), need (0,2,3,4,5,6) > > 2010-11-23 14:37:29.622499 42aea940 -- 172.17.40.39:6800/8550 dispatch_throttle_release 28 to dispatch throttler 56/104857600 > > 2010-11-23 14:37:29.622512 42aea940 -- 172.17.40.39:6800/8550 <== mds3 172.17.40.37:6800/996 3 ==== mds_resolve(1+0 subtrees +0 slave requests) v1 ==== 28+0+0 (486165103 0 0) 0x1264390 > > 2010-11-23 14:37:29.622528 42aea940 mds1.cache handle_resolve from mds3 > > 2010-11-23 14:37:29.622537 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.622547 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.622566 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).connect error 172.17.40.34:6800/7571, 111: Connection refused > > 2010-11-23 14:37:29.622593 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).fault 111: Connection refused > > 2010-11-23 14:37:29.622611 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).fault first fault > > 2010-11-23 14:37:29.622626 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).writer: state = 1 policy.server=0 > > 2010-11-23 14:37:29.622642 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).connect 2 > > 2010-11-23 14:37:29.622662 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).connecting to 172.17.40.34:6800/7571 > > 2010-11-23 14:37:29.622681 4033e940 -- 172.17.40.39:6800/8550 >> 172.17.40.36:6800/7413 pipe(0x12508b0 sd=15 pgs=8 cs=1 l=0).reader couldn't read tag, Success > > 2010-11-23 14:37:29.622717 42aea940 mds1.cache maybe_resolve_finish got all resolves+resolve_acks, done. > > 2010-11-23 14:37:29.622733 4033e940 -- 172.17.40.39:6800/8550 >> 172.17.40.36:6800/7413 pipe(0x12508b0 sd=15 pgs=8 cs=1 l=0).fault 0: Success > > 2010-11-23 14:37:29.622768 4033e940 -- 172.17.40.39:6800/8550 >> 172.17.40.36:6800/7413 pipe(0x12508b0 sd=15 pgs=8 cs=1 l=0).fault with nothing to send, going to standby > > 2010-11-23 14:37:29.622785 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).connect error 172.17.40.34:6800/7571, 111: Connection refused > > 2010-11-23 14:37:29.622806 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).fault 111: Connection refused > > 2010-11-23 14:37:29.622823 4053e940 -- 172.17.40.39:6800/8550 >> 172.17.40.34:6800/7571 pipe(0x1261690 sd=16 pgs=10 cs=2 l=0).fault waiting 0.200000 > > 2010-11-23 14:37:29.622840 42aea940 mds1.cache disambiguate_imports > > 2010-11-23 14:37:29.622855 410d2940 -- 172.17.40.39:6800/8550 >> 172.17.40.36:6800/7413 pipe(0x12508b0 sd=15 pgs=8 cs=1 l=0).writer: state = 3 policy.server=0 > > 2010-11-23 14:37:29.622909 44ff3940 -- 172.17.40.39:6800/8550 >> 172.17.40.35:6800/7935 pipe(0x1262680 sd=18 pgs=10 cs=1 l=0).reader couldn't read tag, Success > > 2010-11-23 14:37:29.622929 44ff3940 -- 172.17.40.39:6800/8550 >> 172.17.40.35:6800/7935 pipe(0x1262680 sd=18 pgs=10 cs=1 l=0).fault 0: Success > > 2010-11-23 14:37:29.622947 451f5940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).reader couldn't read tag, Success > > 2010-11-23 14:37:29.622965 44ff3940 -- 172.17.40.39:6800/8550 >> 172.17.40.35:6800/7935 pipe(0x1262680 sd=18 pgs=10 cs=1 l=0).fault with nothing to send, going to standby > > 2010-11-23 14:37:29.622988 40fd1940 -- 172.17.40.39:6800/8550 >> 172.17.40.38:6800/7395 pipe(0x124f070 sd=13 pgs=7 cs=1 l=0).reader couldn't read tag, Success > > 2010-11-23 14:37:29.623008 450f4940 -- 172.17.40.39:6800/8550 >> 172.17.40.35:6800/7935 pipe(0x1262680 sd=18 pgs=10 cs=1 l=0).writer: state = 3 policy.server=0 > > 2010-11-23 14:37:29.623041 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.623056 40fd1940 -- 172.17.40.39:6800/8550 >> 172.17.40.38:6800/7395 pipe(0x124f070 sd=13 pgs=7 cs=1 l=0).fault 0: Success > > 2010-11-23 14:37:29.623076 40fd1940 -- 172.17.40.39:6800/8550 >> 172.17.40.38:6800/7395 pipe(0x124f070 sd=13 pgs=7 cs=1 l=0).fault with nothing to send, going to standby > > 2010-11-23 14:37:29.623092 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.623115 451f5940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).fault 0: Success > > 2010-11-23 14:37:29.623134 42aea940 mds1.cache trim_unlinked_inodes > > 2010-11-23 14:37:29.623149 40a03940 -- 172.17.40.39:6800/8550 >> 172.17.40.38:6800/7395 pipe(0x124f070 sd=13 pgs=7 cs=1 l=0).writer: state = 3 policy.server=0 > > 2010-11-23 14:37:29.623168 451f5940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).requeue_sent mds_resolve(1+0 subtrees +0 slave requests) v1 for resend seq 2 (2) > > 2010-11-23 14:37:29.623188 451f5940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=1 l=0).fault initiating reconnect > > 2010-11-23 14:37:29.623205 42aea940 mds1.cache recalc_auth_bits > > 2010-11-23 14:37:29.623219 42aea940 mds1.cache subtree auth=1 for [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.623239 42aea940 mds1.cache show_subtrees > > 2010-11-23 14:37:29.623251 451f5940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=2 l=0).reader done > > 2010-11-23 14:37:29.623270 42aea940 mds1.cache |__ 1 auth [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.623305 42aea940 mds1.cache show_cache > > 2010-11-23 14:37:29.623316 42aea940 mds1.cache unlinked [inode 1 [...2,head] / rep@xxx v1 snaprealm=0x7f5ec80084d0 f() n() (iversion lock) 0x7f5ec800a3b0] > > 2010-11-23 14:37:29.623336 42aea940 mds1.cache unlinked [inode 101 [...2,head] ~mds1/ auth v1 snaprealm=0x1254f30 f() n() (iversion lock) | dirfrag 0x7f5ec800ac40] > > 2010-11-23 14:37:29.623356 42aea940 mds1.cache dirfrag [dir 101 ~mds1/ [2,head] auth v=1 cv=0/0 dir_auth=1 state=1073741824 f(v0 2=1+1) n(v0 2=1+1) hs=0+0,ss=0+0 | subtree 0x1255140] > > 2010-11-23 14:37:29.623376 452f6940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=2 l=0).writer: state = 1 policy.server=0 > > 2010-11-23 14:37:29.623395 452f6940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=2 l=0).connect 2 > > 2010-11-23 14:37:29.623412 42aea940 mds1.cache trim_non_auth > > 2010-11-23 14:37:29.623426 452f6940 -- 172.17.40.39:6800/8550 >> 172.17.40.37:6800/996 pipe(0x1263980 sd=19 pgs=11 cs=2 l=0).connecting to 172.17.40.37:6800/996 > > 2010-11-23 14:37:29.623448 42aea940 mds1.cache ... [inode 1 [...2,head] / rep@xxx v1 snaprealm=0x7f5ec80084d0 f() n() (iversion lock) 0x7f5ec800a3b0] > > > > -- Jim > > > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html