I have a 3 node ceph cluster with 6 disks in each node. I upgraded from Bobtail 0.56.3 to 0.56.4 last night. Before I started the upgrade, ceph status reported HEALTH_OK. After upgrading and restarting the first node the status ended up at HEALTH_WARN 133 pgs stale; 133 pgs stuck stale After checking ceph health detail I checked a few random stuck pgs, all said # ceph pg 3.8 query pgid currently maps to no osd I decided to continue with the upgrade and after upgrading the second node there were 200 total stuck and after the 3rd 300. The cluster is now at 0.56.4 but still reports 300 pgs stuck stale after 12 hours # ceph status health HEALTH_WARN 300 pgs stale; 300 pgs stuck stale monmap e1: 3 mons at {a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0}, election epoch 8668, quorum 0,1,2 a,b,c osdmap e976: 18 osds: 18 up, 18 in pgmap v428986: 5148 pgs: 4848 active+clean, 300 stale+active+clean; 5643 GB data, 11305 GB used, 35831 GB / 47137 GB avail; 0B/s rd, 1136KB/s wr, 146op/s mdsmap e1: 0/0/1 up Strangely, the stuck pgs all start with 3 eg HEALTH_WARN 300 pgs stale; 300 pgs stuck stale pg 3.f is stuck stale for 41062.735988, current state stale+active+clean, last acting [11,13] pg 3.8 is stuck stale for 46905.375678, current state stale+active+clean, last acting [21,35] pg 3.9 is stuck stale for 46905.375680, current state stale+active+clean, last acting [21,14] pg 3.a is stuck stale for 46905.375681, current state stale+active+clean, last acting [21,24] pg 3.b is stuck stale for 46905.375682, current state stale+active+clean, last acting [21,33] pg 3.4 is stuck stale for 46905.375683, current state stale+active+clean, last acting [21,33] pg 3.5 is stuck stale for 46905.375682, current state stale+active+clean, last acting [21,34] pg 3.6 is stuck stale for 46905.375682, current state stale+active+clean, last acting [21,35] pg 3.7 is stuck stale for 46905.375680, current state stale+active+clean, last acting [21,13] pg 3.0 is stuck stale for 46905.375681, current state stale+active+clean, last acting [20,32] pg 3.1 is stuck stale for 46905.375683, current state stale+active+clean, last acting [20,35] pg 3.2 is stuck stale for 41965.928295, current state stale+active+clean, last acting [31,13] pg 3.3 is stuck stale for 46905.375685, current state stale+active+clean, last acting [20,34] pg 3.128 is stuck stale for 41965.928924, current state stale+active+clean, last acting [31,22] pg 3.129 is stuck stale for 41062.736776, current state stale+active+clean, last acting [11,32] pg 3.12a is stuck stale for 41062.736779, current state stale+active+clean, last acting [10,34] pg 3.12b is stuck stale for 46905.376313, current state stale+active+clean, last acting [21,15] pg 3.124 is stuck stale for 46905.376315, current state stale+active+clean, last acting [21,14] pg 3.125 is stuck stale for 41062.736787, current state stale+active+clean, last acting [11,34] pg 3.126 is stuck stale for 41062.736788, current state stale+active+clean, last acting [10,15] pg 3.127 is stuck stale for 41965.928942, current state stale+active+clean, last acting [31,35] pg 3.120 is stuck stale for 41965.928944, current state stale+active+clean, last acting [30,35] pg 3.121 is stuck stale for 41062.736795, current state stale+active+clean, last acting [10,33] pg 3.122 is stuck stale for 41062.736796, current state stale+active+clean, last acting [10,12] pg 3.123 is stuck stale for 41965.928918, current state stale+active+clean, last acting [30,13] pg 3.11c is stuck stale for 41965.928921, current state stale+active+clean, last acting [30,33] pg 3.11d is stuck stale for 41965.928921, current state stale+active+clean, last acting [30,24] pg 3.11e is stuck stale for 46905.376347, current state stale+active+clean, last acting [21,32] pg 3.11f is stuck stale for 41965.928927, current state stale+active+clean, last acting [31,33] pg 3.118 is stuck stale for 41062.736804, current state stale+active+clean, last acting [10,14] pg 3.119 is stuck stale for 41062.736804, current state stale+active+clean, last acting [10,15] etc [root@ceph1 ~]# ceph pg dump_stuck stale ok pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up acting last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 3.f 25 0 0 0 104857600 8162 8162 stale+active+clean 2013-03-21 10:08:15.888399 57'53 46'1250 [11,13] [11,13] 57'53 2013-03-21 10:08:15.888347 0'0 2013-03-20 10:08:04.434172 3.8 21 0 0 0 88080384 6314 6314 stale+active+clean 2013-03-21 16:09:05.501311 57'41 67'1544 [21,35] [21,35] 57'41 2013-03-21 11:56:20.557416 0'0 2013-03-20 10:08:10.204040 3.9 19 0 0 0 79691776 5852 5852 stale+active+clean 2013-03-21 16:09:05.769642 50'38 67'1343 [21,14] [21,14] 50'38 2013-03-21 11:55:35.535342 0'0 2013-03-20 10:07:53.825743 3.a 17 0 0 0 71303168 2772 2772 stale+active+clean 2013-03-21 16:09:05.530051 50'18 67'888 [21,24] [21,24] 50'18 2013-03-21 11:56:21.547257 0'0 2013-03-20 10:08:12.203190 3.b 19 0 0 0 79691776 14800 14800 stale+active+clean 2013-03-21 16:09:05.827664 50'96 67'1869 [21,33] [21,33] 50'96 2013-03-21 11:55:53.535469 0'0 2013-03-20 10:07:57.442050 3.4 22 0 0 0 92274688 11258 11258 stale+active+clean 2013-03-21 16:09:05.828011 57'73 67'1005 [21,33] [21,33] 57'73 2013-03-21 11:55:38.541522 0'0 2013-03-20 10:07:55.441505 3.5 13 0 0 0 54525952 9256 9256 stale+active+clean 2013-03-21 16:09:05.885105 57'60 67'1157 [21,34] [21,34] 57'60 2013-03-21 11:55:39.544341 0'0 2013-03-20 10:07:56.169993 3.6 19 0 0 0 79691776 4774 4774 stale+active+clean 2013-03-21 16:09:05.501393 57'31 67'1517 [21,35] [21,35] 57'31 2013-03-21 11:55:54.524017 0'0 2013-03-20 10:08:03.202763 3.7 20 0 0 0 83886080 6622 6622 stale+active+clean 2013-03-21 16:09:06.320045 50'43 67'1537 [21,13] [21,13] 50'43 2013-03-21 11:56:13.529507 0'0 2013-03-20 10:08:05.203137 3.0 26 0 0 0 109051904 6930 6930 stale+active+clean 2013-03-21 11:55:53.298871 57'45 46'1406 [20,32] [20,32] 57'45 2013-03-21 11:55:53.298830 0'0 2013-03-20 10:08:00.790042 3.1 14 0 0 0 58720256 2772 2772 stale+active+clean 2013-03-21 11:55:38.303601 57'18 46'1571 [20,35] [20,35] 57'18 2013-03-21 11:55:38.303561 0'0 2013-03-20 10:07:54.482538 3.2 18 0 0 0 75497472 6808 6808 stale+active+clean 2013-03-21 11:15:28.011664 57'44 46'1114 [31,13] [31,13] 57'44 2013-03-21 11:15:28.011635 0'0 2013-03-20 10:08:04.857520 3.3 18 0 0 0 75497472 4158 4158 stale+active+clean 2013-03-21 11:55:37.330826 57'27 46'1693 [20,34] [20,34] 57'27 2013-03-21 11:55:37.330787 0'0 2013-03-20 10:07:54.190884 3.128 19 0 0 0 79691776 6468 6468 stale+active+clean 2013-03-21 11:48:37.303563 50'42 46'1488 [31,22] [31,22] 50'42 2013-03-21 11:48:37.303477 0'0 2013-03-20 10:10:42.857644 3.129 23 0 0 0 96468992 10164 10164 stale+active+clean 2013-03-21 10:13:02.920146 50'66 46'1007 [11,32] [11,32] 50'66 2013-03-21 10:13:02.920104 0'0 2013-03-20 10:09:38.296236 3.12a 18 0 0 0 75497472 6622 6622 stale+active+clean 2013-03-21 10:13:18.986584 57'43 46'1544 [10,34] [10,34] 57'43 2013-03-21 10:13:18.986545 48'1 2013-03-20 10:10:58.194099 3.12b 19 0 0 0 79691776 6006 6006 stale+active+clean 2013-03-21 16:09:05.500935 57'39 67'1192 [21,15] [21,15] 57'39 2013-03-21 12:14:57.670099 48'1 2013-03-20 10:10:28.233369 3.124 22 0 0 0 92274688 16802 16802 stale+active+clean 2013-03-21 16:09:05.771001 57'109 67'1325 [21,14] [21,14] 57'109 2013-03-21 12:14:55.689713 48'3 2013-03-20 10:10:20.239677 Notice that they have been stuck for some time, but the cluster reported HEALTH_OK on 0.56.3 I have tried scrubbing pgs but that does not remove them from the list. Darryl The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com