Re: pgs stuck unclean

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Yes, the problem still persists.

I've changed the crushmap because I started with a single OSD and added a second one later and it did not replicate at all by the original crushmap. The crushmap is now:

root@test-4:~# ceph osd getcrushmap -o /tmp/crushmap
got crush map from osdmap epoch 65
root@test-4:~# crushtool -d /tmp/crushmap
# begin crush map

# devices
device 0 osd.0
device 1 osd.1

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 root

# buckets
host test-3 {
	id -2		# do not change unnecessarily
	# weight 1.000
	alg straw
	hash 0	# rjenkins1
	item osd.0 weight 1.000
}
host test-4 {
	id -4		# do not change unnecessarily
	# weight 1.000
	alg straw
	hash 0	# rjenkins1
	item osd.1 weight 1.000
}
rack unknownrack {
	id -3		# do not change unnecessarily
	# weight 2.000
	alg straw
	hash 0	# rjenkins1
	item test-3 weight 1.000
	item test-4 weight 1.000
}
root default {
	id -1		# do not change unnecessarily
	# weight 1.000
	alg straw
	hash 0	# rjenkins1
	item unknownrack weight 1.000
}

# rules
rule data {
	ruleset 0
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}
rule metadata {
	ruleset 1
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}
rule rbd {
	ruleset 2
	type replicated
	min_size 1
	max_size 10
	step take default
	step choose firstn 0 type osd
	step emit
}

# end crush map

Thank you in advance,
Pal


On 04/01/2013 10:42 PM, Sam Lang wrote:
Pal,

Are you still seeing this problem?  It looks like you have a bad
crushmap.  Can you post that to the list if you changed it?

-slang [developer @ http://inktank.com | http://ceph.com]

On Wed, Mar 20, 2013 at 11:41 AM, "Gergely Pál - night[w]"
<nightw@xxxxxxx> wrote:
Hello!

I've deployed a test ceph cluster according to this guide:
http://ceph.com/docs/master/start/quick-start/

The problem is that the cluster will never go to a clean state by itself.

The corresponding outputs are the following:
root@test-4:~# ceph health
HEALTH_WARN 3 pgs degraded; 38 pgs stuck unclean; recovery 2/44 degraded
(4.545%)



root@test-4:~# ceph -s
    health HEALTH_WARN 3 pgs degraded; 38 pgs stuck unclean; recovery 2/44
degraded (4.545%)
    monmap e1: 1 mons at {a=10.0.0.3:6789/0}, election epoch 1, quorum 0 a
    osdmap e45: 2 osds: 2 up, 2 in
     pgmap v344: 384 pgs: 346 active+clean, 35 active+remapped, 3
active+degraded; 6387 KB data, 2025 MB used, 193 GB / 200 GB avail; 2/44
degraded (4.545%)
    mdsmap e29: 1/1/1 up {0=a=up:active}



root@test-4:~# ceph pg dump_stuck unclean
ok
pg_stat objects mip     degr    unf     bytes   log     disklog state
state_stamp     v reported      up      acting  last_scrub      scrub_stamp
last_deep_scrub deep_scrub_stamp
1.6b    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.056155      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:50:19.699765      0'0     2013-03-20
16:50:19.699765
2.6a    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.062933      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:53:22.749668      0'0     2013-03-20
16:53:22.749668
0.62    0       0       0       0       0       2584    2584
active+remapped 2013-03-20 17:15:10.953654 17'19        39'63   [1]
[1,0]   11'12   2013-03-20 16:46:48.646752      11'12   2013-03-20
16:46:48.646752
2.60    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.331682      0'0 39'47       [1]
[1,0]   0'0     2013-03-20 16:53:04.744990      0'0     2013-03-20
16:53:04.744990
1.61    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.345445      0'0 39'47       [1]
[1,0]   0'0     2013-03-20 16:49:58.694300      0'0     2013-03-20
16:49:58.694300
2.45    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.179279      0'0 39'75       [1]
[1,0]   0'0     2013-03-20 16:49:43.649700      0'0     2013-03-20
16:49:43.649700
1.46    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.179239      0'0 39'75       [1]
[1,0]   0'0     2013-03-20 16:47:10.610772      0'0     2013-03-20
16:47:10.610772
0.47    0       0       0       0       0       3808    3808
active+remapped 2013-03-20 17:15:10.953601 17'28        39'93   [1]
[1,0]   11'19   2013-03-20 16:44:31.572090      11'19   2013-03-20
16:44:31.572090
0.3c    0       0       0       0       0       3128    3128
active+remapped 2013-03-20 17:14:08.006824 17'23        36'53   [0]
[0,1]   11'14   2013-03-20 16:46:13.639052      11'14   2013-03-20
16:46:13.639052
1.3b    1       0       0       0       2338546 4224    4224
active+remapped 2013-03-20 17:13:22.018020      41'33   36'87   [0]
[0,1]   0'0     2013-03-20 16:49:01.678543 0'0  2013-03-20 16:49:01.678543
2.3a    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.022849      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:52:06.728006      0'0     2013-03-20
16:52:06.728006
0.35    0       0       0       0       0       4216    4216
active+remapped 2013-03-20 17:14:08.006831 17'31        36'47   [0]
[0,1]   11'23   2013-03-20 16:46:05.636185      11'23   2013-03-20
16:46:05.636185
1.34    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.036661      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:48:46.674504      0'0     2013-03-20
16:48:46.674504
2.33    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.048476      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:51:49.724215      0'0     2013-03-20
16:51:49.724215
0.21    0       0       0       0       0       1360    1360
active+remapped 2013-03-20 17:15:10.953645 17'10        39'20   [1]
[1,0]   0'0     0.000000        0'0     0.000000
1.20    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.290933      0'0 39'19       [1]
[1,0]   0'0     0.000000        0'0     0.000000
2.1f    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.309581      0'0 39'19       [1]
[1,0]   0'0     0.000000        0'0     0.000000
0.1d    0       0       0       0       0       4080    4080
active+remapped 2013-03-20 17:14:08.006880 17'30        36'124  [0]
[0,1]   11'20   2013-03-20 16:43:51.560375      11'20   2013-03-20
16:43:51.560375
1.1c    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.131767      0'0 36'83       [0]
[0,1]   0'0     2013-03-20 16:46:06.593051      0'0     2013-03-20
16:46:06.593051
2.1b    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:22.148274      0'0 36'83       [0]
[0,1]   0'0     2013-03-20 16:48:39.633091      0'0     2013-03-20
16:48:39.633091
0.15    0       0       0       0       0       1768    1768
active+degraded 2013-03-20 17:14:04.005586 17'13        36'80   [0]     [0]
0'0     0.000000        0'0     0.000000
1.14    2       0       2       0       512     2308    2308
active+degraded 2013-03-20 17:13:18.967086      41'18   36'89   [0]     [0]
0'0     0.000000        0'0     0.000000
0.14    0       0       0       0       0       2448    2448
active+remapped 2013-03-20 17:15:10.953657 17'18        39'83   [1]
[1,0]   11'9    2013-03-20 16:43:37.556698      11'9    2013-03-20
16:43:37.556698
1.13    1       0       0       0       29      129     129
active+remapped 2013-03-20 17:14:25.350437 3'1  39'53   [1]     [1,0]   3'1
2013-03-20 16:45:55.590867      3'1     2013-03-20 16:45:55.590867
2.13    0       0       0       0       0       0       0
active+degraded 2013-03-20 17:13:18.968930      0'0 36'66       [0]     [0]
0'0     0.000000        0'0     0.000000
2.12    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.396528      0'0 39'75       [1]
[1,0]   0'0     2013-03-20 16:48:35.632422      0'0     2013-03-20
16:48:35.632422
2.c     0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.400472      0'0 39'47       [1]
[1,0]   0'0     2013-03-20 16:51:13.713841      0'0     2013-03-20
16:51:13.713841
0.e     0       0       0       0       0       1360    1360
active+remapped 2013-03-20 17:15:10.953677 17'10        39'60   [1]
[1,0]   11'5    2013-03-20 16:45:03.617117      11'5    2013-03-20
16:45:03.617117
1.d     0       0       0       0       0       0       0
active+remapped 2013-03-20 17:14:25.177681      0'0 39'47       [1]
[1,0]   0'0     2013-03-20 16:47:42.657407      0'0     2013-03-20
16:47:42.657407
1.6     0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:21.964134      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:47:26.654077      0'0     2013-03-20
16:47:26.654077
0.7     0       0       0       0       0       1088    1088
active+remapped 2013-03-20 17:14:07.006487 17'8 36'44   [0]     [0,1]   11'4
2013-03-20 16:44:49.613165      11'4    2013-03-20 16:44:49.613165
2.5     0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:21.964021      0'0 36'45       [0]
[0,1]   0'0     2013-03-20 16:50:43.706250      0'0     2013-03-20
16:50:43.706250
1.7f    2       0       0       0       956     260     260
active+remapped 2013-03-20 17:13:21.965290 3'2  36'92   [0]     [0,1]   3'2
2013-03-20 16:48:20.628887      3'2     2013-03-20 16:48:20.628887
2.7e    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:21.965519      0'0 36'83       [0]
[0,1]   0'0     2013-03-20 16:51:00.670188      0'0     2013-03-20
16:51:00.670188
0.7b    0       0       0       0       0       1360    1360
active+remapped 2013-03-20 17:14:07.006510 17'10        36'100  [0]
[0,1]   11'5    2013-03-20 16:45:34.587342      11'5    2013-03-20
16:45:34.587342
1.7a    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:21.986318      0'0 36'83       [0]
[0,1]   0'0     2013-03-20 16:48:13.626641      0'0     2013-03-20
16:48:13.626641
2.79    0       0       0       0       0       0       0
active+remapped 2013-03-20 17:13:21.977287      0'0 36'83       [0]
[0,1]   0'0     2013-03-20 16:50:52.668407      0'0     2013-03-20
16:50:52.668407
0.6c    0       0       0       0       0       1904    1904
active+remapped 2013-03-20 17:14:08.006843 17'14        36'50   [0]
[0,1]   11'7    2013-03-20 16:46:58.650771      11'7    2013-03-20
16:46:58.650771



root@test-4:~# ceph osd dump

epoch 45
fsid d28ad054-a66e-4150-9026-ca1301661d9a
created 2013-03-20 15:03:04.547114
modifed 2013-03-20 17:37:32.256578
flags

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 128
pgp_num 128 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 128
pgp_num 128 last_change 1 owner 0
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 128
pgp_num 128 last_change 1 owner 0

max_osd 2
osd.0 up   in  weight 1 up_from 36 up_thru 39 down_at 35 last_clean_interval
[26,34) 10.0.0.3:6801/7996 10.0.0.3:6802/7996 10.0.0.3:6803/7996 exists,up
d3f9a96e-864b-427e-941e-ca2442b190b5
osd.1 up   in  weight 1 up_from 39 up_thru 39 down_at 38 last_clean_interval
[29,34) 10.0.0.4:6800/4696 10.0.0.4:6801/4696 10.0.0.4:6802/4696 exists,up
758490a1-8726-4acf-9d2c-b8c33a9f5528

pg_temp 0.7 [0,1]
pg_temp 0.e [1,0]
pg_temp 0.14 [1,0]
pg_temp 0.1d [0,1]
pg_temp 0.21 [1,0]
pg_temp 0.35 [0,1]
pg_temp 0.3c [0,1]
pg_temp 0.47 [1,0]
pg_temp 0.62 [1,0]
pg_temp 0.6c [0,1]
pg_temp 0.7b [0,1]
pg_temp 1.6 [0,1]
pg_temp 1.d [1,0]
pg_temp 1.13 [1,0]
pg_temp 1.1c [0,1]
pg_temp 1.20 [1,0]
pg_temp 1.34 [0,1]
pg_temp 1.3b [0,1]
pg_temp 1.46 [1,0]
pg_temp 1.61 [1,0]
pg_temp 1.6b [0,1]
pg_temp 1.7a [0,1]
pg_temp 1.7f [0,1]
pg_temp 2.5 [0,1]
pg_temp 2.c [1,0]
pg_temp 2.12 [1,0]
pg_temp 2.1b [0,1]
pg_temp 2.1f [1,0]
pg_temp 2.33 [0,1]
pg_temp 2.3a [0,1]
pg_temp 2.45 [1,0]
pg_temp 2.60 [1,0]
pg_temp 2.6a [0,1]
pg_temp 2.79 [0,1]
pg_temp 2.7e [0,1]



root@test-4:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/debian-root
                       1.8G  851M  858M  50% /
tmpfs                 502M     0  502M   0% /lib/init/rw
udev                  496M  136K  496M   1% /dev
tmpfs                 502M     0  502M   0% /dev/shm
/dev/mapper/3600000e00d1100000011049200210000
                       100G 1014M   97G   2% /var/lib/ceph/osd/ceph-1

Can you help me to get this sorted out?

Regards,
Pal
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux