Re: Upgrade stale PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ping,
Any ideas? A week later and it is still the same, 300 pgs stuck stale.
I have seen a few references since recommending that there are no gaps in the OSD numbers. Mine has gaps. Might this the be cause of my problem.

Darryl

On 04/05/13 07:27, Darryl Bond wrote:
I have a 3 node ceph cluster with 6 disks in each node.
I upgraded from Bobtail 0.56.3 to 0.56.4 last night.
Before I started the upgrade, ceph status reported HEALTH_OK.
After upgrading and restarting the first node the status ended up at
HEALTH_WARN 133 pgs stale; 133 pgs stuck stale
After checking ceph health detail I checked a few random stuck pgs, all said
# ceph pg 3.8 query
pgid currently maps to no osd

I decided to continue with the upgrade and after upgrading the second
node there were 200 total stuck and after the 3rd 300.

The cluster is now at 0.56.4 but still reports 300 pgs stuck stale after
12 hours
# ceph status
     health HEALTH_WARN 300 pgs stale; 300 pgs stuck stale
     monmap e1: 3 mons at
{a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0},
election epoch 8668, quorum 0,1,2 a,b,c
     osdmap e976: 18 osds: 18 up, 18 in
      pgmap v428986: 5148 pgs: 4848 active+clean, 300 stale+active+clean;
5643 GB data, 11305 GB used, 35831 GB / 47137 GB avail; 0B/s rd,
1136KB/s wr, 146op/s
     mdsmap e1: 0/0/1 up

Strangely, the stuck pgs all start with 3
eg
HEALTH_WARN 300 pgs stale; 300 pgs stuck stale
pg 3.f is stuck stale for 41062.735988, current state
stale+active+clean, last acting [11,13]
pg 3.8 is stuck stale for 46905.375678, current state
stale+active+clean, last acting [21,35]
pg 3.9 is stuck stale for 46905.375680, current state
stale+active+clean, last acting [21,14]
pg 3.a is stuck stale for 46905.375681, current state
stale+active+clean, last acting [21,24]
pg 3.b is stuck stale for 46905.375682, current state
stale+active+clean, last acting [21,33]
pg 3.4 is stuck stale for 46905.375683, current state
stale+active+clean, last acting [21,33]
pg 3.5 is stuck stale for 46905.375682, current state
stale+active+clean, last acting [21,34]
pg 3.6 is stuck stale for 46905.375682, current state
stale+active+clean, last acting [21,35]
pg 3.7 is stuck stale for 46905.375680, current state
stale+active+clean, last acting [21,13]
pg 3.0 is stuck stale for 46905.375681, current state
stale+active+clean, last acting [20,32]
pg 3.1 is stuck stale for 46905.375683, current state
stale+active+clean, last acting [20,35]
pg 3.2 is stuck stale for 41965.928295, current state
stale+active+clean, last acting [31,13]
pg 3.3 is stuck stale for 46905.375685, current state
stale+active+clean, last acting [20,34]
pg 3.128 is stuck stale for 41965.928924, current state
stale+active+clean, last acting [31,22]
pg 3.129 is stuck stale for 41062.736776, current state
stale+active+clean, last acting [11,32]
pg 3.12a is stuck stale for 41062.736779, current state
stale+active+clean, last acting [10,34]
pg 3.12b is stuck stale for 46905.376313, current state
stale+active+clean, last acting [21,15]
pg 3.124 is stuck stale for 46905.376315, current state
stale+active+clean, last acting [21,14]
pg 3.125 is stuck stale for 41062.736787, current state
stale+active+clean, last acting [11,34]
pg 3.126 is stuck stale for 41062.736788, current state
stale+active+clean, last acting [10,15]
pg 3.127 is stuck stale for 41965.928942, current state
stale+active+clean, last acting [31,35]
pg 3.120 is stuck stale for 41965.928944, current state
stale+active+clean, last acting [30,35]
pg 3.121 is stuck stale for 41062.736795, current state
stale+active+clean, last acting [10,33]
pg 3.122 is stuck stale for 41062.736796, current state
stale+active+clean, last acting [10,12]
pg 3.123 is stuck stale for 41965.928918, current state
stale+active+clean, last acting [30,13]
pg 3.11c is stuck stale for 41965.928921, current state
stale+active+clean, last acting [30,33]
pg 3.11d is stuck stale for 41965.928921, current state
stale+active+clean, last acting [30,24]
pg 3.11e is stuck stale for 46905.376347, current state
stale+active+clean, last acting [21,32]
pg 3.11f is stuck stale for 41965.928927, current state
stale+active+clean, last acting [31,33]
pg 3.118 is stuck stale for 41062.736804, current state
stale+active+clean, last acting [10,14]
pg 3.119 is stuck stale for 41062.736804, current state
stale+active+clean, last acting [10,15]
etc
[root@ceph1 ~]# ceph pg dump_stuck stale
ok
pg_stat    objects    mip    degr    unf    bytes    log disklog
state    state_stamp    v    reported    up    acting last_scrub
scrub_stamp    last_deep_scrub    deep_scrub_stamp
3.f    25    0    0    0    104857600    8162    8162
stale+active+clean    2013-03-21 10:08:15.888399    57'53 46'1250
[11,13]    [11,13]    57'53    2013-03-21 10:08:15.888347    0'0
2013-03-20 10:08:04.434172
3.8    21    0    0    0    88080384    6314    6314
stale+active+clean    2013-03-21 16:09:05.501311    57'41 67'1544
[21,35]    [21,35]    57'41    2013-03-21 11:56:20.557416    0'0
2013-03-20 10:08:10.204040
3.9    19    0    0    0    79691776    5852    5852
stale+active+clean    2013-03-21 16:09:05.769642    50'38 67'1343
[21,14]    [21,14]    50'38    2013-03-21 11:55:35.535342    0'0
2013-03-20 10:07:53.825743
3.a    17    0    0    0    71303168    2772    2772
stale+active+clean    2013-03-21 16:09:05.530051    50'18 67'888
[21,24]    [21,24]    50'18    2013-03-21 11:56:21.547257    0'0
2013-03-20 10:08:12.203190
3.b    19    0    0    0    79691776    14800    14800
stale+active+clean    2013-03-21 16:09:05.827664    50'96 67'1869
[21,33]    [21,33]    50'96    2013-03-21 11:55:53.535469    0'0
2013-03-20 10:07:57.442050
3.4    22    0    0    0    92274688    11258    11258
stale+active+clean    2013-03-21 16:09:05.828011    57'73 67'1005
[21,33]    [21,33]    57'73    2013-03-21 11:55:38.541522    0'0
2013-03-20 10:07:55.441505
3.5    13    0    0    0    54525952    9256    9256
stale+active+clean    2013-03-21 16:09:05.885105    57'60 67'1157
[21,34]    [21,34]    57'60    2013-03-21 11:55:39.544341    0'0
2013-03-20 10:07:56.169993
3.6    19    0    0    0    79691776    4774    4774
stale+active+clean    2013-03-21 16:09:05.501393    57'31 67'1517
[21,35]    [21,35]    57'31    2013-03-21 11:55:54.524017    0'0
2013-03-20 10:08:03.202763
3.7    20    0    0    0    83886080    6622    6622
stale+active+clean    2013-03-21 16:09:06.320045    50'43 67'1537
[21,13]    [21,13]    50'43    2013-03-21 11:56:13.529507    0'0
2013-03-20 10:08:05.203137
3.0    26    0    0    0    109051904    6930    6930
stale+active+clean    2013-03-21 11:55:53.298871    57'45 46'1406
[20,32]    [20,32]    57'45    2013-03-21 11:55:53.298830    0'0
2013-03-20 10:08:00.790042
3.1    14    0    0    0    58720256    2772    2772
stale+active+clean    2013-03-21 11:55:38.303601    57'18 46'1571
[20,35]    [20,35]    57'18    2013-03-21 11:55:38.303561    0'0
2013-03-20 10:07:54.482538
3.2    18    0    0    0    75497472    6808    6808
stale+active+clean    2013-03-21 11:15:28.011664    57'44 46'1114
[31,13]    [31,13]    57'44    2013-03-21 11:15:28.011635    0'0
2013-03-20 10:08:04.857520
3.3    18    0    0    0    75497472    4158    4158
stale+active+clean    2013-03-21 11:55:37.330826    57'27 46'1693
[20,34]    [20,34]    57'27    2013-03-21 11:55:37.330787    0'0
2013-03-20 10:07:54.190884
3.128    19    0    0    0    79691776    6468    6468
stale+active+clean    2013-03-21 11:48:37.303563    50'42 46'1488
[31,22]    [31,22]    50'42    2013-03-21 11:48:37.303477    0'0
2013-03-20 10:10:42.857644
3.129    23    0    0    0    96468992    10164    10164
stale+active+clean    2013-03-21 10:13:02.920146    50'66 46'1007
[11,32]    [11,32]    50'66    2013-03-21 10:13:02.920104    0'0
2013-03-20 10:09:38.296236
3.12a    18    0    0    0    75497472    6622    6622
stale+active+clean    2013-03-21 10:13:18.986584    57'43 46'1544
[10,34]    [10,34]    57'43    2013-03-21 10:13:18.986545    48'1
2013-03-20 10:10:58.194099
3.12b    19    0    0    0    79691776    6006    6006
stale+active+clean    2013-03-21 16:09:05.500935    57'39 67'1192
[21,15]    [21,15]    57'39    2013-03-21 12:14:57.670099    48'1
2013-03-20 10:10:28.233369
3.124    22    0    0    0    92274688    16802    16802
stale+active+clean    2013-03-21 16:09:05.771001    57'109 67'1325
[21,14]    [21,14]    57'109    2013-03-21 12:14:55.689713    48'3
2013-03-20 10:10:20.239677

Notice that they have been stuck for some time, but the cluster reported
HEALTH_OK on 0.56.3

I have tried scrubbing pgs but that does not remove them from the list.

Darryl


The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux