Re: pgs stuck unclean

Sam Lang <slang@xxxxxxxxxxx> · Mon, 1 Apr 2013 15:42:21 -0500

Pal,

Are you still seeing this problem?  It looks like you have a bad
crushmap.  Can you post that to the list if you changed it?

-slang [developer @ http://inktank.com | http://ceph.com]

On Wed, Mar 20, 2013 at 11:41 AM, "Gergely Pál - night[w]"
<nightw@xxxxxxx> wrote:
> Hello!
>
> I've deployed a test ceph cluster according to this guide:
> http://ceph.com/docs/master/start/quick-start/
>
> The problem is that the cluster will never go to a clean state by itself.
>
> The corresponding outputs are the following:
> root@test-4:~# ceph health
> HEALTH_WARN 3 pgs degraded; 38 pgs stuck unclean; recovery 2/44 degraded
> (4.545%)
>
>
>
> root@test-4:~# ceph -s
>    health HEALTH_WARN 3 pgs degraded; 38 pgs stuck unclean; recovery 2/44
> degraded (4.545%)
>    monmap e1: 1 mons at {a=10.0.0.3:6789/0}, election epoch 1, quorum 0 a
>    osdmap e45: 2 osds: 2 up, 2 in
>     pgmap v344: 384 pgs: 346 active+clean, 35 active+remapped, 3
> active+degraded; 6387 KB data, 2025 MB used, 193 GB / 200 GB avail; 2/44
> degraded (4.545%)
>    mdsmap e29: 1/1/1 up {0=a=up:active}
>
>
>
> root@test-4:~# ceph pg dump_stuck unclean
> ok
> pg_stat objects mip     degr    unf     bytes   log     disklog state
> state_stamp     v reported      up      acting  last_scrub      scrub_stamp
> last_deep_scrub deep_scrub_stamp
> 1.6b    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.056155      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:50:19.699765      0'0     2013-03-20
> 16:50:19.699765
> 2.6a    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.062933      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:53:22.749668      0'0     2013-03-20
> 16:53:22.749668
> 0.62    0       0       0       0       0       2584    2584
> active+remapped 2013-03-20 17:15:10.953654 17'19        39'63   [1]
> [1,0]   11'12   2013-03-20 16:46:48.646752      11'12   2013-03-20
> 16:46:48.646752
> 2.60    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.331682      0'0 39'47       [1]
> [1,0]   0'0     2013-03-20 16:53:04.744990      0'0     2013-03-20
> 16:53:04.744990
> 1.61    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.345445      0'0 39'47       [1]
> [1,0]   0'0     2013-03-20 16:49:58.694300      0'0     2013-03-20
> 16:49:58.694300
> 2.45    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.179279      0'0 39'75       [1]
> [1,0]   0'0     2013-03-20 16:49:43.649700      0'0     2013-03-20
> 16:49:43.649700
> 1.46    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.179239      0'0 39'75       [1]
> [1,0]   0'0     2013-03-20 16:47:10.610772      0'0     2013-03-20
> 16:47:10.610772
> 0.47    0       0       0       0       0       3808    3808
> active+remapped 2013-03-20 17:15:10.953601 17'28        39'93   [1]
> [1,0]   11'19   2013-03-20 16:44:31.572090      11'19   2013-03-20
> 16:44:31.572090
> 0.3c    0       0       0       0       0       3128    3128
> active+remapped 2013-03-20 17:14:08.006824 17'23        36'53   [0]
> [0,1]   11'14   2013-03-20 16:46:13.639052      11'14   2013-03-20
> 16:46:13.639052
> 1.3b    1       0       0       0       2338546 4224    4224
> active+remapped 2013-03-20 17:13:22.018020      41'33   36'87   [0]
> [0,1]   0'0     2013-03-20 16:49:01.678543 0'0  2013-03-20 16:49:01.678543
> 2.3a    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.022849      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:52:06.728006      0'0     2013-03-20
> 16:52:06.728006
> 0.35    0       0       0       0       0       4216    4216
> active+remapped 2013-03-20 17:14:08.006831 17'31        36'47   [0]
> [0,1]   11'23   2013-03-20 16:46:05.636185      11'23   2013-03-20
> 16:46:05.636185
> 1.34    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.036661      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:48:46.674504      0'0     2013-03-20
> 16:48:46.674504
> 2.33    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.048476      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:51:49.724215      0'0     2013-03-20
> 16:51:49.724215
> 0.21    0       0       0       0       0       1360    1360
> active+remapped 2013-03-20 17:15:10.953645 17'10        39'20   [1]
> [1,0]   0'0     0.000000        0'0     0.000000
> 1.20    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.290933      0'0 39'19       [1]
> [1,0]   0'0     0.000000        0'0     0.000000
> 2.1f    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.309581      0'0 39'19       [1]
> [1,0]   0'0     0.000000        0'0     0.000000
> 0.1d    0       0       0       0       0       4080    4080
> active+remapped 2013-03-20 17:14:08.006880 17'30        36'124  [0]
> [0,1]   11'20   2013-03-20 16:43:51.560375      11'20   2013-03-20
> 16:43:51.560375
> 1.1c    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.131767      0'0 36'83       [0]
> [0,1]   0'0     2013-03-20 16:46:06.593051      0'0     2013-03-20
> 16:46:06.593051
> 2.1b    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:22.148274      0'0 36'83       [0]
> [0,1]   0'0     2013-03-20 16:48:39.633091      0'0     2013-03-20
> 16:48:39.633091
> 0.15    0       0       0       0       0       1768    1768
> active+degraded 2013-03-20 17:14:04.005586 17'13        36'80   [0]     [0]
> 0'0     0.000000        0'0     0.000000
> 1.14    2       0       2       0       512     2308    2308
> active+degraded 2013-03-20 17:13:18.967086      41'18   36'89   [0]     [0]
> 0'0     0.000000        0'0     0.000000
> 0.14    0       0       0       0       0       2448    2448
> active+remapped 2013-03-20 17:15:10.953657 17'18        39'83   [1]
> [1,0]   11'9    2013-03-20 16:43:37.556698      11'9    2013-03-20
> 16:43:37.556698
> 1.13    1       0       0       0       29      129     129
> active+remapped 2013-03-20 17:14:25.350437 3'1  39'53   [1]     [1,0]   3'1
> 2013-03-20 16:45:55.590867      3'1     2013-03-20 16:45:55.590867
> 2.13    0       0       0       0       0       0       0
> active+degraded 2013-03-20 17:13:18.968930      0'0 36'66       [0]     [0]
> 0'0     0.000000        0'0     0.000000
> 2.12    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.396528      0'0 39'75       [1]
> [1,0]   0'0     2013-03-20 16:48:35.632422      0'0     2013-03-20
> 16:48:35.632422
> 2.c     0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.400472      0'0 39'47       [1]
> [1,0]   0'0     2013-03-20 16:51:13.713841      0'0     2013-03-20
> 16:51:13.713841
> 0.e     0       0       0       0       0       1360    1360
> active+remapped 2013-03-20 17:15:10.953677 17'10        39'60   [1]
> [1,0]   11'5    2013-03-20 16:45:03.617117      11'5    2013-03-20
> 16:45:03.617117
> 1.d     0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:14:25.177681      0'0 39'47       [1]
> [1,0]   0'0     2013-03-20 16:47:42.657407      0'0     2013-03-20
> 16:47:42.657407
> 1.6     0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:21.964134      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:47:26.654077      0'0     2013-03-20
> 16:47:26.654077
> 0.7     0       0       0       0       0       1088    1088
> active+remapped 2013-03-20 17:14:07.006487 17'8 36'44   [0]     [0,1]   11'4
> 2013-03-20 16:44:49.613165      11'4    2013-03-20 16:44:49.613165
> 2.5     0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:21.964021      0'0 36'45       [0]
> [0,1]   0'0     2013-03-20 16:50:43.706250      0'0     2013-03-20
> 16:50:43.706250
> 1.7f    2       0       0       0       956     260     260
> active+remapped 2013-03-20 17:13:21.965290 3'2  36'92   [0]     [0,1]   3'2
> 2013-03-20 16:48:20.628887      3'2     2013-03-20 16:48:20.628887
> 2.7e    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:21.965519      0'0 36'83       [0]
> [0,1]   0'0     2013-03-20 16:51:00.670188      0'0     2013-03-20
> 16:51:00.670188
> 0.7b    0       0       0       0       0       1360    1360
> active+remapped 2013-03-20 17:14:07.006510 17'10        36'100  [0]
> [0,1]   11'5    2013-03-20 16:45:34.587342      11'5    2013-03-20
> 16:45:34.587342
> 1.7a    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:21.986318      0'0 36'83       [0]
> [0,1]   0'0     2013-03-20 16:48:13.626641      0'0     2013-03-20
> 16:48:13.626641
> 2.79    0       0       0       0       0       0       0
> active+remapped 2013-03-20 17:13:21.977287      0'0 36'83       [0]
> [0,1]   0'0     2013-03-20 16:50:52.668407      0'0     2013-03-20
> 16:50:52.668407
> 0.6c    0       0       0       0       0       1904    1904
> active+remapped 2013-03-20 17:14:08.006843 17'14        36'50   [0]
> [0,1]   11'7    2013-03-20 16:46:58.650771      11'7    2013-03-20
> 16:46:58.650771
>
>
>
> root@test-4:~# ceph osd dump
>
> epoch 45
> fsid d28ad054-a66e-4150-9026-ca1301661d9a
> created 2013-03-20 15:03:04.547114
> modifed 2013-03-20 17:37:32.256578
> flags
>
> pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 128
> pgp_num 128 last_change 1 owner 0 crash_replay_interval 45
> pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 128
> pgp_num 128 last_change 1 owner 0
> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 128
> pgp_num 128 last_change 1 owner 0
>
> max_osd 2
> osd.0 up   in  weight 1 up_from 36 up_thru 39 down_at 35 last_clean_interval
> [26,34) 10.0.0.3:6801/7996 10.0.0.3:6802/7996 10.0.0.3:6803/7996 exists,up
> d3f9a96e-864b-427e-941e-ca2442b190b5
> osd.1 up   in  weight 1 up_from 39 up_thru 39 down_at 38 last_clean_interval
> [29,34) 10.0.0.4:6800/4696 10.0.0.4:6801/4696 10.0.0.4:6802/4696 exists,up
> 758490a1-8726-4acf-9d2c-b8c33a9f5528
>
> pg_temp 0.7 [0,1]
> pg_temp 0.e [1,0]
> pg_temp 0.14 [1,0]
> pg_temp 0.1d [0,1]
> pg_temp 0.21 [1,0]
> pg_temp 0.35 [0,1]
> pg_temp 0.3c [0,1]
> pg_temp 0.47 [1,0]
> pg_temp 0.62 [1,0]
> pg_temp 0.6c [0,1]
> pg_temp 0.7b [0,1]
> pg_temp 1.6 [0,1]
> pg_temp 1.d [1,0]
> pg_temp 1.13 [1,0]
> pg_temp 1.1c [0,1]
> pg_temp 1.20 [1,0]
> pg_temp 1.34 [0,1]
> pg_temp 1.3b [0,1]
> pg_temp 1.46 [1,0]
> pg_temp 1.61 [1,0]
> pg_temp 1.6b [0,1]
> pg_temp 1.7a [0,1]
> pg_temp 1.7f [0,1]
> pg_temp 2.5 [0,1]
> pg_temp 2.c [1,0]
> pg_temp 2.12 [1,0]
> pg_temp 2.1b [0,1]
> pg_temp 2.1f [1,0]
> pg_temp 2.33 [0,1]
> pg_temp 2.3a [0,1]
> pg_temp 2.45 [1,0]
> pg_temp 2.60 [1,0]
> pg_temp 2.6a [0,1]
> pg_temp 2.79 [0,1]
> pg_temp 2.7e [0,1]
>
>
>
> root@test-4:~# df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/mapper/debian-root
>                       1.8G  851M  858M  50% /
> tmpfs                 502M     0  502M   0% /lib/init/rw
> udev                  496M  136K  496M   1% /dev
> tmpfs                 502M     0  502M   0% /dev/shm
> /dev/mapper/3600000e00d1100000011049200210000
>                       100G 1014M   97G   2% /var/lib/ceph/osd/ceph-1
>
> Can you help me to get this sorted out?
>
> Regards,
> Pal
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com