Hi Aaron, The key question is why osd.21 is crashing. Can you attach the last few hundred lines of the log after the crash (which should include a stack trace and a bit of context)? Thanks! sage On Mon, 10 Feb 2014, Aaron Ten Clay wrote: > Hi everyone, > > I've run into a problem with my cluster - 1 pg is incomplete, and that is > blocking reads for a 100TiB RBD volume. (The VM actually halts execution, > it's not like a normal I/O problem where the virtual controller times out > and tries to reset the bus, etc.) > > I've read several threads about the problem with incomplete pgs and have dug > around quite a bit, but I suspect I don't quite know where to look for the > information I need. > > The pool holding this volume has a size of 2, with min_size of 2. I suspect > the problem began when two osds, in separate hosts, failed within a short > time of each other. (osd.2 and osd.21 in this case.) The physical disk for > osd.2 is dead, but 21's disk seems okay and the XFS filesystem behind it > doesn't have any problems that xfs_repair can find. > > In attempt to resolve the issue, I've restarted the individual osds, the > entire cluster, rebooted all the cluster hosts, and upgraded to the latest > devel build to rule out having hit a known and fixed bug. Most recently, I > tried marking osd 8 'out', since I believe 5 has the data that is missing. I > can provide logfiles from the osds 5 and 8 with increased debugging if > that'll help. > > I can restart osd.21, and that makes the "incomplete" pg go away for a few > minutes, but osd.21 crashes within 30 seconds of being started, and the > incomplete pg comes back when that happens. > > Any suggestions on how to troubleshoot this further would be helpful. I > thought I could chronicle my attempts to date to avoid duplication of effort > but I have tried too many things to clearly reconstruct the path. > > Thanks in advance! > > > Here are some various stats that might help: > > aaron@seven ~ $ ceph -v > ceph version 0.76 (3b990136bfab74249f166dd742fd8e61637e63d9) > > > aaron@seven ~ $ ceph pg stat > v7286520: 2200 pgs: 2048 active+clean, 78 active+remapped+wait_backfill, 72 > active+remapped+backfilling, 1 incomplete, 1 active+clean+inconsistent; > 21577 GB data, 41396 GB used, 27946 GB / 69343 GB avail; 461896/11663686 > objects degraded (3.960%); 210 MB/s, 53 objects/s recovering > > > aaron@seven ~ $ ceph health detail > HEALTH_ERR 78 pgs backfill; 72 pgs backfilling; 1 pgs incomplete; 1 pgs > inconsistent; 1 pgs stuck inactive; 151 pgs stuck unclean; recovery > 459868/11663686 objects degraded (3.943%); 1 scrub errors; mds picard is > laggy > pg 2.28b is stuck inactive since forever, current state incomplete, last > acting [6,5] > pg 2.6b is stuck unclean for 1161.425227, current state > active+remapped+backfilling, last acting [19,8,4] > pg 3.1e4 is stuck unclean for 888.661689, current state > active+remapped+wait_backfill, last acting [8,20,22] > pg 2.1e5 is stuck unclean for 888.661578, current state > active+remapped+wait_backfill, last acting [8,20,22] > pg 2.127 is stuck unclean for 889.325675, current state > active+remapped+backfilling, last acting [5,8,11] > pg 2.63 is stuck unclean for 888.661689, current state > active+remapped+wait_backfill, last acting [8,16,3] > pg 3.62 is stuck unclean for 888.661682, current state > active+remapped+wait_backfill, last acting [8,16,3] > pg 2.1e0 is stuck unclean for 888.661675, current state > active+remapped+wait_backfill, last acting [8,19,1] > pg 2.420 is stuck unclean for 888.662537, current state > active+remapped+wait_backfill, last acting [8,17,1] > pg 3.1df is stuck unclean for 888.661671, current state > active+remapped+wait_backfill, last acting [8,19,1] > pg 2.118 is stuck unclean for 888.662593, current state > active+remapped+wait_backfill, last acting [8,19,3] > pg 2.29e is stuck unclean for 219581.715114, current state > active+remapped+backfilling, last acting [14,8,7] > pg 3.114 is stuck unclean for 888.662518, current state > active+remapped+wait_backfill, last acting [8,14,3] > pg 2.115 is stuck unclean for 888.662537, current state > active+remapped+backfilling, last acting [8,14,3] > pg 2.57 is stuck unclean for 159935.373946, current state > active+remapped+backfilling, last acting [6,8,0] > pg 3.56 is stuck unclean for 1643032.146070, current state > active+remapped+wait_backfill, last acting [6,8,0] > pg 2.1d4 is stuck unclean for 888.661671, current state > active+remapped+backfilling, last acting [8,13,3] > pg 3.117 is stuck unclean for 888.662503, current state > active+remapped+wait_backfill, last acting [8,19,3] > pg 2.294 is stuck unclean for 888.662490, current state > active+remapped+wait_backfill, last acting [8,17,20] > pg 2.41a is stuck unclean for 1161.455746, current state > active+remapped+wait_backfill, last acting [20,8,0] > pg 2.1d0 is stuck unclean for 888.661674, current state > active+remapped+wait_backfill, last acting [8,16,14] > pg 3.351 is stuck unclean for 888.661636, current state > active+remapped+wait_backfill, last acting [8,11,17] > pg 3.293 is stuck unclean for 888.662448, current state > active+remapped+wait_backfill, last acting [8,17,20] > pg 2.352 is stuck unclean for 888.661623, current state > active+remapped+wait_backfill, last acting [8,11,17] > pg 3.1cf is stuck unclean for 888.661638, current state > active+remapped+wait_backfill, last acting [8,16,14] > pg 2.40d is stuck unclean for 889.663590, current state > active+remapped+backfilling, last acting [16,8,3] > pg 2.34f is stuck unclean for 888.661581, current state > active+remapped+wait_backfill, last acting [8,18,9] > pg 3.34e is stuck unclean for 888.661644, current state > active+remapped+wait_backfill, last acting [8,18,9] > pg 2.4a is stuck unclean for 146084.212967, current state > active+remapped+backfilling, last acting [16,8,4] > pg 2.1cb is stuck unclean for 81410.959695, current state > active+remapped+backfilling, last acting [19,8,9] > pg 2.1ca is stuck unclean for 160231.487694, current state > active+remapped+backfilling, last acting [15,8,7] > pg 2.28b is stuck unclean since forever, current state incomplete, last > acting [6,5] > pg 2.409 is stuck unclean for 160095.566719, current state > active+remapped+backfilling, last acting [4,8,1] > pg 2.345 is stuck unclean for 219637.448262, current state > active+remapped+backfilling, last acting [6,8,15] > pg 2.405 is stuck unclean for 160095.566704, current state > active+remapped+backfilling, last acting [14,8,6] > pg 2.fc is stuck unclean for 888.662298, current state > active+remapped+wait_backfill, last acting [8,20,3] > pg 2.342 is stuck unclean for 219394.652780, current state > active+remapped+backfilling, last acting [6,8,22] > pg 2.1bf is stuck unclean for 889.325454, current state > active+remapped+backfilling, last acting [11,8,7] > pg 2.1b8 is stuck unclean for 889.326218, current state > active+remapped+backfilling, last acting [15,8,5] > pg 3.fb is stuck unclean for 888.662289, current state > active+remapped+wait_backfill, last acting [8,20,3] > pg 2.338 is stuck unclean for 1161.424960, current state > active+remapped+backfilling, last acting [19,8,13] > pg 2.3f9 is stuck unclean for 889.326078, current state > active+remapped+backfilling, last acting [14,8,10] > pg 3.1b6 is stuck unclean for 888.661612, current state > active+remapped+wait_backfill, last acting [8,17,0] > pg 2.1b7 is stuck unclean for 888.661625, current state > active+remapped+wait_backfill, last acting [8,17,0] > pg 2.3f7 is stuck unclean for 888.662259, current state > active+remapped+backfilling, last acting [8,11,0] > pg 2.270 is stuck unclean for 889.669800, current state > active+remapped+backfilling, last acting [6,8,7] > pg 2.2c is stuck unclean for 888.661599, current state > active+remapped+wait_backfill, last acting [8,15,9] > pg 3.2d is stuck unclean for 888.661593, current state > active+remapped+wait_backfill, last acting [8,15,3] > pg 2.273 is stuck unclean for 1161.773478, current state > active+remapped+backfilling, last acting [18,8,1] > pg 2.3f0 is stuck unclean for 85410.265493, current state > active+remapped+backfilling, last acting [5,8,10] > pg 2.2e is stuck unclean for 888.661595, current state > active+remapped+wait_backfill, last acting [8,15,3] > pg 3.2b is stuck unclean for 888.661580, current state > active+remapped+wait_backfill, last acting [8,15,9] > pg 3.1a9 is stuck unclean for 888.661574, current state > active+remapped+wait_backfill, last acting [8,18,7] > pg 2.3ef is stuck unclean for 1161.426416, current state > active+remapped+backfilling, last acting [19,8,7] > pg 2.1aa is stuck unclean for 888.661661, current state > active+remapped+wait_backfill, last acting [8,18,7] > pg 2.32b is stuck unclean for 156663.886748, current state > active+remapped+backfilling, last acting [14,8,17] > pg 2.21 is stuck unclean for 1161.769888, current state > active+remapped+backfilling, last acting [18,8,13] > pg 3.1c is stuck unclean for 888.661636, current state > active+remapped+wait_backfill, last acting [8,16,0] > pg 2.1d is stuck unclean for 888.661648, current state > active+remapped+wait_backfill, last acting [8,16,0] > pg 2.1c is stuck unclean for 888.661632, current state > active+remapped+wait_backfill, last acting [8,20,7] > pg 2.323 is stuck unclean for 302501.945292, current state > active+remapped+backfilling, last acting [13,8,10] > pg 2.4a0 is stuck unclean for 888.661565, current state > active+remapped+wait_backfill, last acting [8,16,11] > pg 3.19 is stuck unclean for 888.661636, current state > active+remapped+wait_backfill, last acting [8,13,0] > pg 3.1a is stuck unclean for 888.661619, current state > active+remapped+wait_backfill, last acting [8,10,5] > pg 2.1b is stuck unclean for 888.661633, current state > active+remapped+wait_backfill, last acting [8,10,5] > pg 2.25e is stuck unclean for 888.662278, current state > active+remapped+backfilling, last acting [8,14,22] > pg 3.1b is stuck unclean for 888.661612, current state > active+remapped+wait_backfill, last acting [8,20,7] > pg 2.1a is stuck unclean for 888.661626, current state > active+remapped+wait_backfill, last acting [8,13,0] > pg 2.3d9 is stuck unclean for 1161.773372, current state > active+remapped+backfilling, last acting [18,8,16] > pg 3.12 is stuck unclean for 888.661578, current state > active+remapped+wait_backfill, last acting [8,17,19] > pg 2.13 is stuck unclean for 888.661591, current state > active+remapped+wait_backfill, last acting [8,17,19] > pg 2.317 is stuck unclean for 159979.814667, current state > active+remapped+backfilling, last acting [4,8,1] > pg 2.3d7 is stuck unclean for 889.327661, current state > active+remapped+backfilling, last acting [11,8,0] > pg 2.30c is stuck unclean for 888.663342, current state > active+remapped+backfilling, last acting [8,15,0] > pg 2.492 is stuck unclean for 889.664908, current state > active+remapped+backfilling, last acting [16,8,22] > pg 2.b is stuck unclean for 888.663383, current state > active+remapped+backfilling, last acting [8,11,22] > pg 2.30f is stuck unclean for 889.327852, current state > active+remapped+backfilling, last acting [4,8,9] > pg 3.3cd is stuck unclean for 888.662282, current state > active+remapped+wait_backfill, last acting [8,11,14] > pg 2.3ce is stuck unclean for 888.662337, current state > active+remapped+wait_backfill, last acting [8,11,14] > pg 2.3c9 is stuck unclean for 888.662309, current state > active+remapped+wait_backfill, last acting [8,19,0] > pg 3.3c8 is stuck unclean for 888.662329, current state > active+remapped+wait_backfill, last acting [8,19,0] > pg 2.184 is stuck unclean for 209128.087843, current state > active+remapped+backfilling, last acting [13,8,9] > pg 2.3cb is stuck unclean for 888.662278, current state > active+remapped+wait_backfill, last acting [8,19,22] > pg 3.3ca is stuck unclean for 888.662310, current state > active+remapped+wait_backfill, last acting [8,19,22] > pg 2.3ca is stuck unclean for 1161.455302, current state > active+remapped+backfilling, last acting [20,8,15] > pg 2.c1 is stuck unclean for 888.662429, current state > active+remapped+wait_backfill, last acting [8,15,13] > pg 3.c0 is stuck unclean for 888.662398, current state > active+remapped+wait_backfill, last acting [8,15,13] > pg 2.2 is stuck unclean for 232941.977076, current state > active+remapped+backfilling, last acting [6,8,3] > pg 2.183 is stuck unclean for 160014.724114, current state > active+remapped+backfilling, last acting [5,8,1] > pg 3.241 is stuck unclean for 888.662292, current state > active+remapped+wait_backfill, last acting [8,18,9] > pg 3.301 is stuck unclean for 888.663258, current state > active+remapped+wait_backfill, last acting [8,19,9] > pg 2.242 is stuck unclean for 888.662284, current state > active+remapped+wait_backfill, last acting [8,18,9] > pg 2.302 is stuck unclean for 888.663248, current state > active+remapped+wait_backfill, last acting [8,19,9] > pg 2.23c is stuck unclean for 889.325420, current state > active+remapped+backfilling, last acting [14,8,0] > pg 2.ba is stuck unclean for 889.327554, current state > active+remapped+backfilling, last acting [12,8,1] > pg 2.175 is stuck unclean for 219770.886373, current state > active+remapped+backfilling, last acting [4,8,9] > pg 2.3b8 is stuck unclean for 888.662229, current state > active+remapped+wait_backfill, last acting [8,14,22] > pg 2.174 is stuck unclean for 888.663214, current state > active+remapped+wait_backfill, last acting [8,15,16] > pg 2.478 is stuck unclean for 1161.774401, current state > active+remapped+backfilling, last acting [18,8,1] > pg 2.2f5 is stuck unclean for 888.663146, current state > active+remapped+backfilling, last acting [8,12,7] > pg 2.176 is stuck unclean for 1161.427407, current state > active+remapped+backfilling, last acting [19,8,1] > pg 3.3b7 is stuck unclean for 888.662341, current state > active+remapped+wait_backfill, last acting [8,14,22] > pg 3.173 is stuck unclean for 888.663097, current state > active+remapped+wait_backfill, last acting [8,15,16] > pg 3.232 is stuck unclean for 888.662195, current state > active+remapped+wait_backfill, last acting [8,20,3] > pg 2.233 is stuck unclean for 888.662216, current state > active+remapped+wait_backfill, last acting [8,20,3] > pg 2.af is stuck unclean for 85410.264417, current state > active+remapped+backfilling, last acting [6,8,3] > pg 3.168 is stuck unclean for 888.663251, current state > active+remapped+wait_backfill, last acting [8,20,0] > pg 2.169 is stuck unclean for 888.663291, current state > active+remapped+wait_backfill, last acting [8,20,0] > pg 2.ab is stuck unclean for 888.662356, current state > active+remapped+wait_backfill, last acting [8,18,17] > pg 3.aa is stuck unclean for 888.662351, current state > active+remapped+wait_backfill, last acting [8,18,17] > pg 3.228 is stuck unclean for 888.662337, current state > active+remapped+wait_backfill, last acting [8,14,0] > pg 2.229 is stuck unclean for 888.662350, current state > active+remapped+wait_backfill, last acting [8,14,0] > pg 2.22b is stuck unclean for 1161.770522, current state > active+remapped+backfilling, last acting [18,8,0] > pg 2.161 is stuck unclean for 1161.839067, current state > active+remapped+backfilling, last acting [17,8,7] > pg 2.3a4 is stuck unclean for 160095.566258, current state > active+remapped+backfilling, last acting [18,8,0] > pg 2.3a7 is stuck unclean for 888.662256, current state > active+remapped+wait_backfill, last acting [8,18,22] > pg 3.3a6 is stuck unclean for 888.662272, current state > active+remapped+wait_backfill, last acting [8,18,22] > pg 2.a2 is stuck unclean for 889.669714, current state > active+remapped+backfilling, last acting [6,8,7] > pg 2.467 is stuck unclean for 889.327384, current state > active+remapped+backfilling, last acting [13,8,3] > pg 2.3a1 is stuck unclean for 159994.542551, current state > active+remapped+backfilling, last acting [19,8,5] > pg 3.3a3 is stuck unclean for 1643146.397267, current state > active+remapped+backfilling, last acting [18,8,0] > pg 2.158 is stuck unclean for 889.331607, current state > active+remapped+backfilling, last acting [15,8,9] > pg 2.218 is stuck unclean for 1161.455144, current state > active+remapped+backfilling, last acting [20,8,16] > pg 2.95 is stuck unclean for 888.662241, current state > active+remapped+wait_backfill, last acting [8,13,0] > pg 3.94 is stuck unclean for 888.662236, current state > active+remapped+wait_backfill, last acting [8,13,0] > pg 2.399 is stuck unclean for 1161.424873, current state > active+remapped+backfilling, last acting [19,8,11] > pg 2.2da is stuck unclean for 889.325694, current state > active+remapped+backfilling, last acting [10,8,22] > pg 2.458 is stuck unclean for 889.664588, current state > active+remapped+backfilling, last acting [16,8,19] > pg 2.455 is stuck unclean for 888.663123, current state > active+remapped+wait_backfill, last acting [8,18,17] > pg 2.397 is stuck unclean for 888.662229, current state > active+remapped+wait_backfill, last acting [8,17,1] > pg 3.396 is stuck unclean for 888.662244, current state > active+remapped+wait_backfill, last acting [8,17,1] > pg 2.152 is stuck unclean for 889.664670, current state > active+remapped+backfilling, last acting [16,8,3] > pg 2.20c is stuck unclean for 889.325056, current state > active+remapped+backfilling, last acting [14,8,20] > pg 2.44d is stuck unclean for 888.663004, current state > active+remapped+wait_backfill, last acting [8,11,13] > pg 2.2ce is stuck unclean for 1161.838375, current state > active+remapped+backfilling, last acting [17,8,13] > pg 3.20a is stuck unclean for 1161.425993, current state > active+remapped+wait_backfill, last acting [19,8,0] > pg 2.20b is stuck unclean for 1161.424832, current state > active+remapped+wait_backfill, last acting [19,8,0] > pg 2.2c8 is stuck unclean for 1161.427151, current state > active+remapped+backfilling, last acting [19,8,9] > pg 2.20a is stuck unclean for 889.326613, current state > active+remapped+backfilling, last acting [4,8,6] > pg 2.87 is stuck unclean for 1161.837786, current state > active+remapped+backfilling, last acting [17,8,15] > pg 2.384 is stuck unclean for 889.324992, current state > active+remapped+backfilling, last acting [14,8,5] > pg 2.2c0 is stuck unclean for 888.662914, current state > active+remapped+wait_backfill, last acting [8,11,18] > pg 2.446 is stuck unclean for 1161.774298, current state > active+remapped+backfilling, last acting [18,8,0] > pg 3.13c is stuck unclean for 888.662984, current state > active+remapped+wait_backfill, last acting [8,16,14] > pg 2.13d is stuck unclean for 888.663018, current state > active+remapped+backfilling, last acting [8,16,14] > pg 2.1fd is stuck unclean for 888.662197, current state > active+remapped+backfilling, last acting [8,12,22] > pg 2.440 is stuck unclean for 195955.637808, current state > active+remapped+backfilling, last acting [17,8,11] > pg 3.2bf is stuck unclean for 888.662780, current state > active+remapped+wait_backfill, last acting [8,11,18] > pg 2.1f8 is stuck unclean for 889.326608, current state > active+remapped+backfilling, last acting [11,8,22] > pg 2.135 is stuck unclean for 888.662853, current state > active+remapped+backfilling, last acting [8,11,0] > pg 2.2b5 is stuck unclean for 889.325574, current state > active+remapped+backfilling, last acting [10,8,7] > pg 2.436 is stuck unclean for 160015.918251, current state > active+remapped+backfilling, last acting [17,8,3] > pg 2.4a0 is active+remapped+wait_backfill, acting [8,16,11] > pg 2.492 is active+remapped+backfilling, acting [16,8,22] > pg 2.478 is active+remapped+backfilling, acting [18,8,1] > pg 2.467 is active+remapped+backfilling, acting [13,8,3] > pg 2.458 is active+remapped+backfilling, acting [16,8,19] > pg 2.455 is active+remapped+wait_backfill, acting [8,18,17] > pg 2.44d is active+remapped+wait_backfill, acting [8,11,13] > pg 2.446 is active+remapped+backfilling, acting [18,8,0] > pg 2.440 is active+remapped+backfilling, acting [17,8,11] > pg 2.436 is active+remapped+backfilling, acting [17,8,3] > pg 2.420 is active+remapped+wait_backfill, acting [8,17,1] > pg 2.41a is active+remapped+wait_backfill, acting [20,8,0] > pg 2.40d is active+remapped+backfilling, acting [16,8,3] > pg 2.409 is active+remapped+backfilling, acting [4,8,1] > pg 2.405 is active+remapped+backfilling, acting [14,8,6] > pg 2.3f9 is active+remapped+backfilling, acting [14,8,10] > pg 2.3f7 is active+remapped+backfilling, acting [8,11,0] > pg 2.3f0 is active+remapped+backfilling, acting [5,8,10] > pg 2.3ef is active+remapped+backfilling, acting [19,8,7] > pg 2.3d9 is active+remapped+backfilling, acting [18,8,16] > pg 2.3d7 is active+remapped+backfilling, acting [11,8,0] > pg 3.3cd is active+remapped+wait_backfill, acting [8,11,14] > pg 2.3ce is active+remapped+wait_backfill, acting [8,11,14] > pg 3.3c8 is active+remapped+wait_backfill, acting [8,19,0] > pg 2.3c9 is active+remapped+wait_backfill, acting [8,19,0] > pg 3.3ca is active+remapped+wait_backfill, acting [8,19,22] > pg 2.3cb is active+remapped+wait_backfill, acting [8,19,22] > pg 2.3ca is active+remapped+backfilling, acting [20,8,15] > pg 2.3b8 is active+remapped+wait_backfill, acting [8,14,22] > pg 3.3b7 is active+remapped+wait_backfill, acting [8,14,22] > pg 2.3a4 is active+remapped+backfilling, acting [18,8,0] > pg 3.3a6 is active+remapped+wait_backfill, acting [8,18,22] > pg 2.3a7 is active+remapped+wait_backfill, acting [8,18,22] > pg 2.3a1 is active+remapped+backfilling, acting [19,8,5] > pg 3.3a3 is active+remapped+backfilling, acting [18,8,0] > pg 2.399 is active+remapped+backfilling, acting [19,8,11] > pg 3.396 is active+remapped+wait_backfill, acting [8,17,1] > pg 2.397 is active+remapped+wait_backfill, acting [8,17,1] > pg 2.384 is active+remapped+backfilling, acting [14,8,5] > pg 3.351 is active+remapped+wait_backfill, acting [8,11,17] > pg 2.352 is active+remapped+wait_backfill, acting [8,11,17] > pg 3.34e is active+remapped+wait_backfill, acting [8,18,9] > pg 2.34f is active+remapped+wait_backfill, acting [8,18,9] > pg 2.345 is active+remapped+backfilling, acting [6,8,15] > pg 2.342 is active+remapped+backfilling, acting [6,8,22] > pg 2.338 is active+remapped+backfilling, acting [19,8,13] > pg 2.32b is active+remapped+backfilling, acting [14,8,17] > pg 2.323 is active+remapped+backfilling, acting [13,8,10] > pg 2.317 is active+remapped+backfilling, acting [4,8,1] > pg 2.30c is active+remapped+backfilling, acting [8,15,0] > pg 2.30f is active+remapped+backfilling, acting [4,8,9] > pg 3.301 is active+remapped+wait_backfill, acting [8,19,9] > pg 2.302 is active+remapped+wait_backfill, acting [8,19,9] > pg 2.2f5 is active+remapped+backfilling, acting [8,12,7] > pg 2.2da is active+remapped+backfilling, acting [10,8,22] > pg 2.2ce is active+remapped+backfilling, acting [17,8,13] > pg 2.2c8 is active+remapped+backfilling, acting [19,8,9] > pg 2.2c0 is active+remapped+wait_backfill, acting [8,11,18] > pg 3.2bf is active+remapped+wait_backfill, acting [8,11,18] > pg 2.2b5 is active+remapped+backfilling, acting [10,8,7] > pg 2.29e is active+remapped+backfilling, acting [14,8,7] > pg 2.294 is active+remapped+wait_backfill, acting [8,17,20] > pg 3.293 is active+remapped+wait_backfill, acting [8,17,20] > pg 2.28b is incomplete, acting [6,5] > pg 2.270 is active+remapped+backfilling, acting [6,8,7] > pg 2.273 is active+remapped+backfilling, acting [18,8,1] > pg 2.25e is active+remapped+backfilling, acting [8,14,22] > pg 3.241 is active+remapped+wait_backfill, acting [8,18,9] > pg 2.242 is active+remapped+wait_backfill, acting [8,18,9] > pg 2.23c is active+remapped+backfilling, acting [14,8,0] > pg 2.233 is active+remapped+wait_backfill, acting [8,20,3] > pg 3.232 is active+remapped+wait_backfill, acting [8,20,3] > pg 2.229 is active+remapped+wait_backfill, acting [8,14,0] > pg 3.228 is active+remapped+wait_backfill, acting [8,14,0] > pg 2.22b is active+remapped+backfilling, acting [18,8,0] > pg 2.218 is active+remapped+backfilling, acting [20,8,16] > pg 2.20c is active+remapped+backfilling, acting [14,8,20] > pg 2.20b is active+remapped+wait_backfill, acting [19,8,0] > pg 3.20a is active+remapped+wait_backfill, acting [19,8,0] > pg 2.20a is active+remapped+backfilling, acting [4,8,6] > pg 2.1fd is active+remapped+backfilling, acting [8,12,22] > pg 2.1f8 is active+remapped+backfilling, acting [11,8,22] > pg 2.1e5 is active+remapped+wait_backfill, acting [8,20,22] > pg 3.1e4 is active+remapped+wait_backfill, acting [8,20,22] > pg 2.1e0 is active+remapped+wait_backfill, acting [8,19,1] > pg 3.1df is active+remapped+wait_backfill, acting [8,19,1] > pg 2.1d4 is active+remapped+backfilling, acting [8,13,3] > pg 2.1d0 is active+remapped+wait_backfill, acting [8,16,14] > pg 3.1cf is active+remapped+wait_backfill, acting [8,16,14] > pg 2.1cb is active+remapped+backfilling, acting [19,8,9] > pg 2.1ca is active+remapped+backfilling, acting [15,8,7] > pg 2.1bf is active+remapped+backfilling, acting [11,8,7] > pg 2.1b8 is active+remapped+backfilling, acting [15,8,5] > pg 2.1b7 is active+remapped+wait_backfill, acting [8,17,0] > pg 3.1b6 is active+remapped+wait_backfill, acting [8,17,0] > pg 3.1a9 is active+remapped+wait_backfill, acting [8,18,7] > pg 2.1aa is active+remapped+wait_backfill, acting [8,18,7] > pg 2.184 is active+remapped+backfilling, acting [13,8,9] > pg 2.183 is active+remapped+backfilling, acting [5,8,1] > pg 2.175 is active+remapped+backfilling, acting [4,8,9] > pg 2.174 is active+remapped+wait_backfill, acting [8,15,16] > pg 2.176 is active+remapped+backfilling, acting [19,8,1] > pg 3.173 is active+remapped+wait_backfill, acting [8,15,16] > pg 2.169 is active+remapped+wait_backfill, acting [8,20,0] > pg 3.168 is active+remapped+wait_backfill, acting [8,20,0] > pg 2.161 is active+remapped+backfilling, acting [17,8,7] > pg 2.158 is active+remapped+backfilling, acting [15,8,9] > pg 2.152 is active+remapped+backfilling, acting [16,8,3] > pg 2.13d is active+remapped+backfilling, acting [8,16,14] > pg 3.13c is active+remapped+wait_backfill, acting [8,16,14] > pg 2.135 is active+remapped+backfilling, acting [8,11,0] > pg 2.127 is active+remapped+backfilling, acting [5,8,11] > pg 2.118 is active+remapped+wait_backfill, acting [8,19,3] > pg 2.115 is active+remapped+backfilling, acting [8,14,3] > pg 3.114 is active+remapped+wait_backfill, acting [8,14,3] > pg 3.117 is active+remapped+wait_backfill, acting [8,19,3] > pg 2.fc is active+remapped+wait_backfill, acting [8,20,3] > pg 3.fb is active+remapped+wait_backfill, acting [8,20,3] > pg 2.cc is active+clean+inconsistent, acting [20,6] > pg 3.c0 is active+remapped+wait_backfill, acting [8,15,13] > pg 2.c1 is active+remapped+wait_backfill, acting [8,15,13] > pg 2.ba is active+remapped+backfilling, acting [12,8,1] > pg 2.af is active+remapped+backfilling, acting [6,8,3] > pg 3.aa is active+remapped+wait_backfill, acting [8,18,17] > pg 2.ab is active+remapped+wait_backfill, acting [8,18,17] > pg 2.a2 is active+remapped+backfilling, acting [6,8,7] > pg 3.94 is active+remapped+wait_backfill, acting [8,13,0] > pg 2.95 is active+remapped+wait_backfill, acting [8,13,0] > pg 2.87 is active+remapped+backfilling, acting [17,8,15] > pg 2.6b is active+remapped+backfilling, acting [19,8,4] > pg 3.62 is active+remapped+wait_backfill, acting [8,16,3] > pg 2.63 is active+remapped+wait_backfill, acting [8,16,3] > pg 3.56 is active+remapped+wait_backfill, acting [6,8,0] > pg 2.57 is active+remapped+backfilling, acting [6,8,0] > pg 2.4a is active+remapped+backfilling, acting [16,8,4] > pg 3.2d is active+remapped+wait_backfill, acting [8,15,3] > pg 2.2c is active+remapped+wait_backfill, acting [8,15,9] > pg 2.2e is active+remapped+wait_backfill, acting [8,15,3] > pg 3.2b is active+remapped+wait_backfill, acting [8,15,9] > pg 2.21 is active+remapped+backfilling, acting [18,8,13] > pg 2.1d is active+remapped+wait_backfill, acting [8,16,0] > pg 3.1c is active+remapped+wait_backfill, acting [8,16,0] > pg 2.1c is active+remapped+wait_backfill, acting [8,20,7] > pg 3.19 is active+remapped+wait_backfill, acting [8,13,0] > pg 2.1b is active+remapped+wait_backfill, acting [8,10,5] > pg 3.1a is active+remapped+wait_backfill, acting [8,10,5] > pg 2.1a is active+remapped+wait_backfill, acting [8,13,0] > pg 3.1b is active+remapped+wait_backfill, acting [8,20,7] > pg 2.13 is active+remapped+wait_backfill, acting [8,17,19] > pg 3.12 is active+remapped+wait_backfill, acting [8,17,19] > pg 2.b is active+remapped+backfilling, acting [8,11,22] > pg 2.2 is active+remapped+backfilling, acting [6,8,3] > recovery 459868/11663686 objects degraded (3.943%) > 1 scrub errors > mds.picard at 10.42.6.21:6800/13626 is laggy/unresponsive > > > > > > > > > aaron@seven ~ $ ceph pg 2.28b query > { "state": "incomplete", > "epoch": 36361, > "up": [ > 6, > 5], > "acting": [ > 6, > 5], > "info": { "pgid": "2.28b", > "last_update": "35256'44286", > "last_complete": "35256'44286", > "log_tail": "34732'41286", > "last_user_version": 0, > "last_backfill": > "84ed7a8b\/rbd_data.a623c2ae8944a.0000000000052a3a\/head\/\/2", > "purged_snaps": "[]", > "history": { "epoch_created": 1, > "last_epoch_started": 36252, > "last_epoch_clean": 34760, > "last_epoch_split": 0, > "same_up_since": 35405, > "same_interval_since": 36276, > "same_primary_since": 36274, > "last_scrub": "34757'44284", > "last_scrub_stamp": "2014-02-08 11:33:51.835956", > "last_deep_scrub": "34757'44284", > "last_deep_scrub_stamp": "2014-02-08 11:33:45.299503", > "last_clean_scrub_stamp": "2014-02-08 11:33:51.835956"}, > "stats": { "version": "35256'44286", > "reported_seq": "727", > "reported_epoch": "36361", > "state": "incomplete", > "last_fresh": "2014-02-10 19:35:37.361600", > "last_change": "2014-02-10 19:22:15.856289", > "last_active": "0.000000", > "last_clean": "0.000000", > "last_became_active": "0.000000", > "last_unstale": "2014-02-10 19:35:37.361600", > "mapping_epoch": 36274, > "log_start": "34732'41286", > "ondisk_log_start": "34732'41286", > "created": 1, > "last_epoch_clean": 34760, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "34757'44284", > "last_scrub_stamp": "2014-02-08 11:33:51.835956", > "last_deep_scrub": "34757'44284", > "last_deep_scrub_stamp": "2014-02-08 11:33:45.299503", > "last_clean_scrub_stamp": "2014-02-08 11:33:51.835956", > "log_size": 3000, > "ondisk_log_size": 3000, > "stats_invalid": "0", > "stat_sum": { "num_bytes": 13767208960, > "num_objects": 3306, > "num_object_clones": 0, > "num_object_copies": 6612, > "num_objects_missing_on_primary": 0, > "num_objects_degraded": 0, > "num_objects_unfound": 0, > "num_objects_dirty": 3300, > "num_whiteouts": 0, > "num_read": 0, > "num_read_kb": 0, > "num_write": 0, > "num_write_kb": 0, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 0, > "num_bytes_recovered": 0, > "num_keys_recovered": 0}, > "stat_cat_sum": {}, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > "empty": 0, > "dne": 0, > "incomplete": 1, > "last_epoch_started": 36252, > "hit_set_history": { "current_last_update": "0'0", > "current_last_stamp": "0.000000", > "current_info": { "begin": "0.000000", > "end": "0.000000", > "version": "0'0"}, > "history": []}}, > "peer_info": [ > { "peer": 5, > "pgid": "2.28b", > "last_update": "34757'44284", > "last_complete": "34757'44284", > "log_tail": "34732'41284", > "last_user_version": 0, > "last_backfill": > "84ed7a8b\/rbd_data.a623c2ae8944a.0000000000052a3a\/head\/\/2", > "purged_snaps": "[]", > "history": { "epoch_created": 1, > "last_epoch_started": 36252, > "last_epoch_clean": 34760, > "last_epoch_split": 0, > "same_up_since": 35405, > "same_interval_since": 36276, > "same_primary_since": 36274, > "last_scrub": "34757'44284", > "last_scrub_stamp": "2014-02-08 11:33:51.835956", > "last_deep_scrub": "34757'44284", > "last_deep_scrub_stamp": "2014-02-08 11:33:45.299503", > "last_clean_scrub_stamp": "2014-02-08 11:33:51.835956"}, > "stats": { "version": "34757'44284", > "reported_seq": "247", > "reported_epoch": "35404", > "state": "down+peering", > "last_fresh": "2014-02-09 21:05:56.090968", > "last_change": "2014-02-09 21:05:33.224591", > "last_active": "0.000000", > "last_clean": "0.000000", > "last_became_active": "0.000000", > "last_unstale": "2014-02-09 21:05:56.090968", > "mapping_epoch": 36274, > "log_start": "34732'41284", > "ondisk_log_start": "34732'41284", > "created": 1, > "last_epoch_clean": 34760, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "34757'44284", > "last_scrub_stamp": "2014-02-08 11:33:51.835956", > "last_deep_scrub": "34757'44284", > "last_deep_scrub_stamp": "2014-02-08 11:33:45.299503", > "last_clean_scrub_stamp": "2014-02-08 11:33:51.835956", > "log_size": 3000, > "ondisk_log_size": 3000, > "stats_invalid": "0", > "stat_sum": { "num_bytes": 13771403264, > "num_objects": 3307, > "num_object_clones": 0, > "num_object_copies": 6614, > "num_objects_missing_on_primary": 0, > "num_objects_degraded": 0, > "num_objects_unfound": 0, > "num_objects_dirty": 0, > "num_whiteouts": 0, > "num_read": 0, > "num_read_kb": 0, > "num_write": 0, > "num_write_kb": 0, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 0, > "num_bytes_recovered": 0, > "num_keys_recovered": 0}, > "stat_cat_sum": {}, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > "empty": 0, > "dne": 0, > "incomplete": 1, > "last_epoch_started": 35110, > "hit_set_history": { "current_last_update": "0'0", > "current_last_stamp": "0.000000", > "current_info": { "begin": "0.000000", > "end": "0.000000", > "version": "0'0"}, > "history": []}}, > { "peer": 8, > "pgid": "2.28b", > "last_update": "35256'44286", > "last_complete": "35256'44286", > "log_tail": "34732'41284", > "last_user_version": 44286, > "last_backfill": > "a8dd7a8b\/benchmark_data_seven_910_object168\/head\/\/2", > "purged_snaps": "[]", > "history": { "epoch_created": 1, > "last_epoch_started": 35225, > "last_epoch_clean": 34760, > "last_epoch_split": 0, > "same_up_since": 35405, > "same_interval_since": 36276, > "same_primary_since": 36274, > "last_scrub": "34757'44284", > "last_scrub_stamp": "2014-02-08 11:33:51.835956", > "last_deep_scrub": "34757'44284", > "last_deep_scrub_stamp": "2014-02-08 11:33:45.299503", > "last_clean_scrub_stamp": "2014-02-08 11:33:51.835956"}, > "stats": { "version": "35256'44286", > "reported_seq": "109", > "reported_epoch": "35310", > "state": "peering", > "last_fresh": "2014-02-09 19:52:07.683337", > "last_change": "2014-02-09 19:52:07.683337", > "last_active": "0.000000", > "last_clean": "0.000000", > "last_became_active": "0.000000", > "last_unstale": "2014-02-09 19:52:07.683337", > "mapping_epoch": 36274, > "log_start": "34732'41284", > "ondisk_log_start": "34732'41284", > "created": 1, > "last_epoch_clean": 34760, > "parent": "0.0", > "parent_split_bits": 0, > "last_scrub": "34757'44284", > "last_scrub_stamp": "2014-02-08 11:33:51.835956", > "last_deep_scrub": "34757'44284", > "last_deep_scrub_stamp": "2014-02-08 11:33:45.299503", > "last_clean_scrub_stamp": "2014-02-08 11:33:51.835956", > "log_size": 3002, > "ondisk_log_size": 3002, > "stats_invalid": "0", > "stat_sum": { "num_bytes": 13763014656, > "num_objects": 3305, > "num_object_clones": 0, > "num_object_copies": 0, > "num_objects_missing_on_primary": 0, > "num_objects_degraded": 0, > "num_objects_unfound": 0, > "num_objects_dirty": 0, > "num_whiteouts": 0, > "num_read": 0, > "num_read_kb": 0, > "num_write": 0, > "num_write_kb": 0, > "num_scrub_errors": 0, > "num_shallow_scrub_errors": 0, > "num_deep_scrub_errors": 0, > "num_objects_recovered": 0, > "num_bytes_recovered": 0, > "num_keys_recovered": 0}, > "stat_cat_sum": {}, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > "empty": 0, > "dne": 0, > "incomplete": 1, > "last_epoch_started": 35225, > "hit_set_history": { "current_last_update": "0'0", > "current_last_stamp": "0.000000", > "current_info": { "begin": "0.000000", > "end": "0.000000", > "version": "0'0"}, > "history": []}}], > "recovery_state": [ > { "name": "Started\/Primary\/Peering", > "enter_time": "2014-02-10 19:22:15.855010", > "past_intervals": [ > { "first": 34758, > "last": 34796, > "maybe_went_rw": 1, > "up": [ > 21], > "acting": [ > 21]}, > { "first": 34797, > "last": 34899, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 34900, > "last": 34946, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 34947, > "last": 34952, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 34953, > "last": 34957, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 34958, > "last": 34959, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 34960, > "last": 35053, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35054, > "last": 35055, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 35056, > "last": 35062, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35063, > "last": 35065, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 35066, > "last": 35068, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35069, > "last": 35071, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 35072, > "last": 35108, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35109, > "last": 35112, > "maybe_went_rw": 1, > "up": [ > 21, > 5], > "acting": [ > 21, > 5]}, > { "first": 35113, > "last": 35120, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35121, > "last": 35160, > "maybe_went_rw": 1, > "up": [ > 8, > 5], > "acting": [ > 8, > 5]}, > { "first": 35161, > "last": 35174, > "maybe_went_rw": 1, > "up": [ > 8], > "acting": [ > 8]}, > { "first": 35175, > "last": 35181, > "maybe_went_rw": 1, > "up": [ > 8, > 5], > "acting": [ > 8, > 5]}, > { "first": 35182, > "last": 35194, > "maybe_went_rw": 0, > "up": [ > 8], > "acting": [ > 8]}, > { "first": 35195, > "last": 35214, > "maybe_went_rw": 0, > "up": [ > 8, > 5], > "acting": [ > 8, > 5]}, > { "first": 35215, > "last": 35222, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35223, > "last": 35223, > "maybe_went_rw": 0, > "up": [ > 8, > 5], > "acting": [ > 8, > 5]}, > { "first": 35224, > "last": 35264, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 21, > 8]}, > { "first": 35265, > "last": 35265, > "maybe_went_rw": 0, > "up": [ > 6, > 5], > "acting": [ > 8]}, > { "first": 35266, > "last": 35287, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > { "first": 35288, > "last": 35299, > "maybe_went_rw": 1, > "up": [ > 6], > "acting": [ > 6]}, > { "first": 35300, > "last": 35303, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > { "first": 35304, > "last": 35305, > "maybe_went_rw": 1, > "up": [ > 6], > "acting": [ > 6]}, > { "first": 35306, > "last": 35376, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > { "first": 35377, > "last": 35386, > "maybe_went_rw": 0, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35387, > "last": 35396, > "maybe_went_rw": 0, > "up": [], > "acting": []}, > { "first": 35397, > "last": 35404, > "maybe_went_rw": 1, > "up": [ > 5], > "acting": [ > 5]}, > { "first": 35405, > "last": 35407, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > { "first": 35408, > "last": 35616, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 21, > 6]}, > { "first": 35617, > "last": 35618, > "maybe_went_rw": 0, > "up": [ > 6, > 5], > "acting": [ > 6]}, > { "first": 35619, > "last": 36246, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > { "first": 36247, > "last": 36248, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 21, > 6]}, > { "first": 36249, > "last": 36249, > "maybe_went_rw": 0, > "up": [ > 6, > 5], > "acting": [ > 6]}, > { "first": 36250, > "last": 36250, > "maybe_went_rw": 0, > "up": [ > 6, > 5], > "acting": [ > 6, > 5]}, > { "first": 36251, > "last": 36273, > "maybe_went_rw": 1, > "up": [ > 6, > 5], > "acting": [ > 21, > 6]}, > { "first": 36274, > "last": 36275, > "maybe_went_rw": 0, > "up": [ > 6, > 5], > "acting": [ > 6]}], > "probing_osds": [ > 5, > 6], > "down_osds_we_would_probe": [ > 21], > "peering_blocked_by": []}, > { "name": "Started", > "enter_time": "2014-02-10 19:22:15.854966"}]} > > > -- > Aaron Ten Clay > http://www.aarontc.com/ > >
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com