Re: one pg stuck with 2 unfound pieces

Jens-Christian Fischer <jens-christian.fischer@xxxxxxxxx> · Mon, 23 Sep 2013 10:20:50 +0200

Hi Sam

in the meantime, the output of ceph pg 0.cfa query has become quite a bit longer (for better or worse) - see:  http://pastebin.com/0Jxmm353

I have restarted osd.23 with the debug log settings and have extracted these 0.cfa related log lines - I can't interpret them. There might be more, I can provide the complete log file if you need it: http://pastebin.com/dYsihsx4

0.cfa has been out so long, that it shows up as being down forever

HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean; 1 mons down, quorum 0,1,2,4 h1,h5,s2,s4
pg 0.cfa is stuck inactive since forever, current state incomplete, last acting [23,50,18]
pg 0.cfa is stuck unclean since forever, current state incomplete, last acting [23,50,18]
pg 0.cfa is incomplete, acting [23,50,18]

also, we can't revert 0.cfa

root@h0:~# ceph pg 0.cfa mark_unfound_lost revert
pg has no unfound objects

This stuck pg seems to fill up our mons (they need to keep old data, right?) which makes starting a new mon a task of seemingly herculean proportions.

Any ideas on how to proceed?

thanks

Jens-Christian

-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fischer@xxxxxxxxx
http://www.switch.ch

http://www.switch.ch/socialmedia

On 14.08.2013, at 20:53, Samuel Just <sam.just@xxxxxxxxxxx> wrote:

Try restarting the two osd processes with debug osd = 20, debug ms =
1, debug filestore = 20.  Restarting the osds may clear the problem,
but if it recurs, the logs should help explain what's going on.
-Sam

On Wed, Aug 14, 2013 at 12:17 AM, Jens-Christian Fischer
<jens-christian.fischer@xxxxxxxxx> wrote:
On 13.08.2013, at 21:09, Samuel Just <sam.just@xxxxxxxxxxx> wrote:

You can run 'ceph pg 0.cfa mark_unfound_lost revert'. (Revert Lost
section of http://ceph.com/docs/master/rados/operations/placement-groups/).
-Sam

As I wrote further down the info, ceph wouldn't let me do that:

root@ineri ~$ ceph pg 0.cfa  mark_unfound_lost revert
pg has 2 objects but we haven't probed all sources, not marking lost

I'm looking for a way that forces the (re) probing of the sources…

cheers
jc

On Tue, Aug 13, 2013 at 6:50 AM, Jens-Christian Fischer
<jens-christian.fischer@xxxxxxxxx> wrote:
We have a cluster with 10 servers, 64 OSDs and 5 Mons on them. The OSDs are
3TB disk, formatted with btrfs and the servers are either on Ubuntu 12.10 or
13.04.

Recently one of the servers (13.04) stood still (due to problems with btrfs
- something we have seen a few times). I decided to not try to recover the
disks, but reformat them with XFS. I removed the OSDs, reformatted, and
re-created them (they got the same OSD numbers)

I redid this twice (because I wrongly partioned the disks in the first
place) and I ended up with 2 unfound "pieces" in one pg:

root@s2:~# ceph health details
HEALTH_WARN 1 pgs degraded; 1 pgs recovering; 1 pgs stuck unclean; recovery
4448/28915270 degraded (0.015%); 2/9854766 unfound (0.000%)
pg 0.cfa is stuck unclean for 1004252.309704, current state
active+recovering+degraded+remapped, last acting [23,50]
pg 0.cfa is active+recovering+degraded+remapped, acting [23,50], 2 unfound
recovery 4448/28915270 degraded (0.015%); 2/9854766 unfound (0.000%)

root@s2:~# ceph pg 0.cfa query

{ "state": "active+recovering+degraded+remapped",
 "epoch": 28197,
 "up": [
       23,
       50,
       18],
 "acting": [
       23,
       50],
 "info": { "pgid": "0.cfa",
     "last_update": "28082'7774",
     "last_complete": "23686'7083",
     "log_tail": "14360'4061",
     "last_backfill": "MAX",
     "purged_snaps": "[]",
     "history": { "epoch_created": 1,
         "last_epoch_started": 28197,
         "last_epoch_clean": 24810,
         "last_epoch_split": 0,
         "same_up_since": 28195,
         "same_interval_since": 28196,
         "same_primary_since": 26036,
         "last_scrub": "20585'6801",
         "last_scrub_stamp": "2013-07-28 15:40:53.298786",
         "last_deep_scrub": "20585'6801",
         "last_deep_scrub_stamp": "2013-07-28 15:40:53.298786",
         "last_clean_scrub_stamp": "2013-07-28 15:40:53.298786"},
     "stats": { "version": "28082'7774",
         "reported": "28197'41950",
         "state": "active+recovering+degraded+remapped",
         "last_fresh": "2013-08-13 14:34:33.057271",
         "last_change": "2013-08-13 14:34:33.057271",
         "last_active": "2013-08-13 14:34:33.057271",
         "last_clean": "2013-08-01 23:50:18.414082",
         "last_became_active": "2013-05-29 13:10:51.366237",
         "last_unstale": "2013-08-13 14:34:33.057271",
         "mapping_epoch": 28195,
         "log_start": "14360'4061",
         "ondisk_log_start": "14360'4061",
         "created": 1,
         "last_epoch_clean": 24810,
         "parent": "0.0",
         "parent_split_bits": 0,
         "last_scrub": "20585'6801",
         "last_scrub_stamp": "2013-07-28 15:40:53.298786",
         "last_deep_scrub": "20585'6801",
         "last_deep_scrub_stamp": "2013-07-28 15:40:53.298786",
         "last_clean_scrub_stamp": "2013-07-28 15:40:53.298786",
         "log_size": 0,
         "ondisk_log_size": 0,
         "stats_invalid": "0",
         "stat_sum": { "num_bytes": 145307402,
             "num_objects": 2234,
             "num_object_clones": 0,
             "num_object_copies": 0,
             "num_objects_missing_on_primary": 0,
             "num_objects_degraded": 0,
             "num_objects_unfound": 0,
             "num_read": 744,
             "num_read_kb": 410184,
             "num_write": 7774,
             "num_write_kb": 1155438,
             "num_scrub_errors": 0,
             "num_shallow_scrub_errors": 0,
             "num_deep_scrub_errors": 0,
             "num_objects_recovered": 3998,
             "num_bytes_recovered": 278803622,
             "num_keys_recovered": 0},
         "stat_cat_sum": {},
         "up": [
               23,
               50,
               18],
         "acting": [
               23,
               50]},
     "empty": 0,
     "dne": 0,
     "incomplete": 0,
     "last_epoch_started": 28197},
 "recovery_state": [
       { "name": "Started\/Primary\/Active",
         "enter_time": "2013-08-13 14:34:33.026698",
         "might_have_unfound": [
               { "osd": 9,
                 "status": "querying"},
               { "osd": 18,
                 "status": "querying"},
               { "osd": 50,
                 "status": "already probed"}],
         "recovery_progress": { "backfill_target": 50,
             "waiting_on_backfill": 0,
             "backfill_pos": "96220cfa\/10000799e82.00000000\/head\/\/0",
             "backfill_info": { "begin": "0\/\/0\/\/-1",
                 "end": "0\/\/0\/\/-1",
                 "objects": []},
             "peer_backfill_info": { "begin": "0\/\/0\/\/-1",
                 "end": "0\/\/0\/\/-1",
                 "objects": []},
             "backfills_in_flight": [],
             "pull_from_peer": [],
             "pushing": []},
         "scrub": { "scrubber.epoch_start": "0",
             "scrubber.active": 0,
             "scrubber.block_writes": 0,
             "scrubber.finalizing": 0,
             "scrubber.waiting_on": 0,
             "scrubber.waiting_on_whom": []}},
       { "name": "Started",
         "enter_time": "2013-08-13 14:34:32.024282"}]}

I have tried to mark those two pieces as lost, but ceph wouldn't let me (due
to the fact that it is still in querying state on osd 9 and 18). I have
restarted the OSDs, but I can't force any other status change.

What next? Take the OSDs (9, 18) out again and rebuilding?

thanks for your help
Jens-Christian

--
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fischer@xxxxxxxxx
http://www.switch.ch

http://www.switch.ch/socialmedia

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com