Pal, Are you still seeing this problem? It looks like you have a bad crushmap. Can you post that to the list if you changed it? -slang [developer @ http://inktank.com | http://ceph.com] On Wed, Mar 20, 2013 at 11:41 AM, "Gergely Pál - night[w]" <nightw@xxxxxxx> wrote: > Hello! > > I've deployed a test ceph cluster according to this guide: > http://ceph.com/docs/master/start/quick-start/ > > The problem is that the cluster will never go to a clean state by itself. > > The corresponding outputs are the following: > root@test-4:~# ceph health > HEALTH_WARN 3 pgs degraded; 38 pgs stuck unclean; recovery 2/44 degraded > (4.545%) > > > > root@test-4:~# ceph -s > health HEALTH_WARN 3 pgs degraded; 38 pgs stuck unclean; recovery 2/44 > degraded (4.545%) > monmap e1: 1 mons at {a=10.0.0.3:6789/0}, election epoch 1, quorum 0 a > osdmap e45: 2 osds: 2 up, 2 in > pgmap v344: 384 pgs: 346 active+clean, 35 active+remapped, 3 > active+degraded; 6387 KB data, 2025 MB used, 193 GB / 200 GB avail; 2/44 > degraded (4.545%) > mdsmap e29: 1/1/1 up {0=a=up:active} > > > > root@test-4:~# ceph pg dump_stuck unclean > ok > pg_stat objects mip degr unf bytes log disklog state > state_stamp v reported up acting last_scrub scrub_stamp > last_deep_scrub deep_scrub_stamp > 1.6b 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.056155 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:50:19.699765 0'0 2013-03-20 > 16:50:19.699765 > 2.6a 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.062933 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:53:22.749668 0'0 2013-03-20 > 16:53:22.749668 > 0.62 0 0 0 0 0 2584 2584 > active+remapped 2013-03-20 17:15:10.953654 17'19 39'63 [1] > [1,0] 11'12 2013-03-20 16:46:48.646752 11'12 2013-03-20 > 16:46:48.646752 > 2.60 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.331682 0'0 39'47 [1] > [1,0] 0'0 2013-03-20 16:53:04.744990 0'0 2013-03-20 > 16:53:04.744990 > 1.61 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.345445 0'0 39'47 [1] > [1,0] 0'0 2013-03-20 16:49:58.694300 0'0 2013-03-20 > 16:49:58.694300 > 2.45 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.179279 0'0 39'75 [1] > [1,0] 0'0 2013-03-20 16:49:43.649700 0'0 2013-03-20 > 16:49:43.649700 > 1.46 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.179239 0'0 39'75 [1] > [1,0] 0'0 2013-03-20 16:47:10.610772 0'0 2013-03-20 > 16:47:10.610772 > 0.47 0 0 0 0 0 3808 3808 > active+remapped 2013-03-20 17:15:10.953601 17'28 39'93 [1] > [1,0] 11'19 2013-03-20 16:44:31.572090 11'19 2013-03-20 > 16:44:31.572090 > 0.3c 0 0 0 0 0 3128 3128 > active+remapped 2013-03-20 17:14:08.006824 17'23 36'53 [0] > [0,1] 11'14 2013-03-20 16:46:13.639052 11'14 2013-03-20 > 16:46:13.639052 > 1.3b 1 0 0 0 2338546 4224 4224 > active+remapped 2013-03-20 17:13:22.018020 41'33 36'87 [0] > [0,1] 0'0 2013-03-20 16:49:01.678543 0'0 2013-03-20 16:49:01.678543 > 2.3a 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.022849 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:52:06.728006 0'0 2013-03-20 > 16:52:06.728006 > 0.35 0 0 0 0 0 4216 4216 > active+remapped 2013-03-20 17:14:08.006831 17'31 36'47 [0] > [0,1] 11'23 2013-03-20 16:46:05.636185 11'23 2013-03-20 > 16:46:05.636185 > 1.34 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.036661 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:48:46.674504 0'0 2013-03-20 > 16:48:46.674504 > 2.33 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.048476 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:51:49.724215 0'0 2013-03-20 > 16:51:49.724215 > 0.21 0 0 0 0 0 1360 1360 > active+remapped 2013-03-20 17:15:10.953645 17'10 39'20 [1] > [1,0] 0'0 0.000000 0'0 0.000000 > 1.20 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.290933 0'0 39'19 [1] > [1,0] 0'0 0.000000 0'0 0.000000 > 2.1f 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.309581 0'0 39'19 [1] > [1,0] 0'0 0.000000 0'0 0.000000 > 0.1d 0 0 0 0 0 4080 4080 > active+remapped 2013-03-20 17:14:08.006880 17'30 36'124 [0] > [0,1] 11'20 2013-03-20 16:43:51.560375 11'20 2013-03-20 > 16:43:51.560375 > 1.1c 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.131767 0'0 36'83 [0] > [0,1] 0'0 2013-03-20 16:46:06.593051 0'0 2013-03-20 > 16:46:06.593051 > 2.1b 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:22.148274 0'0 36'83 [0] > [0,1] 0'0 2013-03-20 16:48:39.633091 0'0 2013-03-20 > 16:48:39.633091 > 0.15 0 0 0 0 0 1768 1768 > active+degraded 2013-03-20 17:14:04.005586 17'13 36'80 [0] [0] > 0'0 0.000000 0'0 0.000000 > 1.14 2 0 2 0 512 2308 2308 > active+degraded 2013-03-20 17:13:18.967086 41'18 36'89 [0] [0] > 0'0 0.000000 0'0 0.000000 > 0.14 0 0 0 0 0 2448 2448 > active+remapped 2013-03-20 17:15:10.953657 17'18 39'83 [1] > [1,0] 11'9 2013-03-20 16:43:37.556698 11'9 2013-03-20 > 16:43:37.556698 > 1.13 1 0 0 0 29 129 129 > active+remapped 2013-03-20 17:14:25.350437 3'1 39'53 [1] [1,0] 3'1 > 2013-03-20 16:45:55.590867 3'1 2013-03-20 16:45:55.590867 > 2.13 0 0 0 0 0 0 0 > active+degraded 2013-03-20 17:13:18.968930 0'0 36'66 [0] [0] > 0'0 0.000000 0'0 0.000000 > 2.12 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.396528 0'0 39'75 [1] > [1,0] 0'0 2013-03-20 16:48:35.632422 0'0 2013-03-20 > 16:48:35.632422 > 2.c 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.400472 0'0 39'47 [1] > [1,0] 0'0 2013-03-20 16:51:13.713841 0'0 2013-03-20 > 16:51:13.713841 > 0.e 0 0 0 0 0 1360 1360 > active+remapped 2013-03-20 17:15:10.953677 17'10 39'60 [1] > [1,0] 11'5 2013-03-20 16:45:03.617117 11'5 2013-03-20 > 16:45:03.617117 > 1.d 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:14:25.177681 0'0 39'47 [1] > [1,0] 0'0 2013-03-20 16:47:42.657407 0'0 2013-03-20 > 16:47:42.657407 > 1.6 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:21.964134 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:47:26.654077 0'0 2013-03-20 > 16:47:26.654077 > 0.7 0 0 0 0 0 1088 1088 > active+remapped 2013-03-20 17:14:07.006487 17'8 36'44 [0] [0,1] 11'4 > 2013-03-20 16:44:49.613165 11'4 2013-03-20 16:44:49.613165 > 2.5 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:21.964021 0'0 36'45 [0] > [0,1] 0'0 2013-03-20 16:50:43.706250 0'0 2013-03-20 > 16:50:43.706250 > 1.7f 2 0 0 0 956 260 260 > active+remapped 2013-03-20 17:13:21.965290 3'2 36'92 [0] [0,1] 3'2 > 2013-03-20 16:48:20.628887 3'2 2013-03-20 16:48:20.628887 > 2.7e 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:21.965519 0'0 36'83 [0] > [0,1] 0'0 2013-03-20 16:51:00.670188 0'0 2013-03-20 > 16:51:00.670188 > 0.7b 0 0 0 0 0 1360 1360 > active+remapped 2013-03-20 17:14:07.006510 17'10 36'100 [0] > [0,1] 11'5 2013-03-20 16:45:34.587342 11'5 2013-03-20 > 16:45:34.587342 > 1.7a 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:21.986318 0'0 36'83 [0] > [0,1] 0'0 2013-03-20 16:48:13.626641 0'0 2013-03-20 > 16:48:13.626641 > 2.79 0 0 0 0 0 0 0 > active+remapped 2013-03-20 17:13:21.977287 0'0 36'83 [0] > [0,1] 0'0 2013-03-20 16:50:52.668407 0'0 2013-03-20 > 16:50:52.668407 > 0.6c 0 0 0 0 0 1904 1904 > active+remapped 2013-03-20 17:14:08.006843 17'14 36'50 [0] > [0,1] 11'7 2013-03-20 16:46:58.650771 11'7 2013-03-20 > 16:46:58.650771 > > > > root@test-4:~# ceph osd dump > > epoch 45 > fsid d28ad054-a66e-4150-9026-ca1301661d9a > created 2013-03-20 15:03:04.547114 > modifed 2013-03-20 17:37:32.256578 > flags > > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 > pgp_num 128 last_change 1 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 128 > pgp_num 128 last_change 1 owner 0 > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 128 > pgp_num 128 last_change 1 owner 0 > > max_osd 2 > osd.0 up in weight 1 up_from 36 up_thru 39 down_at 35 last_clean_interval > [26,34) 10.0.0.3:6801/7996 10.0.0.3:6802/7996 10.0.0.3:6803/7996 exists,up > d3f9a96e-864b-427e-941e-ca2442b190b5 > osd.1 up in weight 1 up_from 39 up_thru 39 down_at 38 last_clean_interval > [29,34) 10.0.0.4:6800/4696 10.0.0.4:6801/4696 10.0.0.4:6802/4696 exists,up > 758490a1-8726-4acf-9d2c-b8c33a9f5528 > > pg_temp 0.7 [0,1] > pg_temp 0.e [1,0] > pg_temp 0.14 [1,0] > pg_temp 0.1d [0,1] > pg_temp 0.21 [1,0] > pg_temp 0.35 [0,1] > pg_temp 0.3c [0,1] > pg_temp 0.47 [1,0] > pg_temp 0.62 [1,0] > pg_temp 0.6c [0,1] > pg_temp 0.7b [0,1] > pg_temp 1.6 [0,1] > pg_temp 1.d [1,0] > pg_temp 1.13 [1,0] > pg_temp 1.1c [0,1] > pg_temp 1.20 [1,0] > pg_temp 1.34 [0,1] > pg_temp 1.3b [0,1] > pg_temp 1.46 [1,0] > pg_temp 1.61 [1,0] > pg_temp 1.6b [0,1] > pg_temp 1.7a [0,1] > pg_temp 1.7f [0,1] > pg_temp 2.5 [0,1] > pg_temp 2.c [1,0] > pg_temp 2.12 [1,0] > pg_temp 2.1b [0,1] > pg_temp 2.1f [1,0] > pg_temp 2.33 [0,1] > pg_temp 2.3a [0,1] > pg_temp 2.45 [1,0] > pg_temp 2.60 [1,0] > pg_temp 2.6a [0,1] > pg_temp 2.79 [0,1] > pg_temp 2.7e [0,1] > > > > root@test-4:~# df -h > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/debian-root > 1.8G 851M 858M 50% / > tmpfs 502M 0 502M 0% /lib/init/rw > udev 496M 136K 496M 1% /dev > tmpfs 502M 0 502M 0% /dev/shm > /dev/mapper/3600000e00d1100000011049200210000 > 100G 1014M 97G 2% /var/lib/ceph/osd/ceph-1 > > Can you help me to get this sorted out? > > Regards, > Pal > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com