Re: PGs inconsistent, do I fear data loss?

David Turner <drakonstein@xxxxxxxxx> · Wed, 01 Nov 2017 18:51:58 +0000

In that thread, I really like how Wido puts it.  He takes out any bit of code paths, bugs, etc...  In reference to size=3 min_size=1 he says, "Loosing two disks at the same time is something which doesn't happen that much, but if it happens you don't want to modify any data on the only copy which you still have left.  Setting min_size to 1 should be a manual action imho when size = 3 and you loose two copies. In that case YOU decide at that moment if it is the right course of action."

On Wed, Nov 1, 2017 at 2:40 PM Denes Dolhay <denke@xxxxxxxxxxxx> wrote:

    Thanks!

    On 11/01/2017 07:30 PM, Gregory Farnum
      wrote:

      On Wed, Nov 1, 2017 at 11:27 AM Denes Dolhay <denke@xxxxxxxxxxxx>
        wrote:

              Hello,
              I have a trick question for Mr. Turner's scenario:

              Let's assume size=2, min_size=1

              -We are looking at pg "A" acting [1, 2]

              -osd 1 goes down, OK

              -osd 1 comes back up, backfill of pg "A" commences from
              osd 2 to osd 1, OK

              -osd 2 goes down (and therefore pg "A" 's backfill to osd
              1 is incomplete and stopped) not OK, but this is the
              case...

              --> In this event, why does osd 1 accept IO to pg "A"
              knowing full well, that it's data is outdated and will
              cause an inconsistent state?

              Wouldn't it be prudent to deny io to pg "A" until either

              -osd 2 comes back (therefore we have a clean osd in the
              acting group) ... backfill would continue to osd 1 of
              course

              -or data in pg "A" is manually marked as lost, and then
              continues operation from osd 1 's (outdated) copy?

          It does deny IO in that case. I think David was pointing
            out that if OSD 2 is actually dead and gone, you've got data
            loss despite having only lost one OSD.
          -Greg

              Thanks in advance, I'm really curious!

              Denes.

              On
                11/01/2017 06:33 PM, Mario Giammarco wrote:

                I have read your post then read the
                  thread you suggested, very interesting.
                  Then I read again your post and understood
                    better.
                  The most important thing is that even with
                    min_size=1 writes are acknowledged after ceph wrote
                    size=2 copies.
                  In the thread above there is: 

                    As David already said, when all OSDs are up and in for a PG Ceph will wait for ALL OSDs to Ack the write. Writes in RADOS are always synchronous.

Only when OSDs go down you need at least min_size OSDs up before writes or reads are accepted.

So if min_size = 2 and size = 3 you need at least 2 OSDs online for I/O to take place.

                  You then show me a sequence of events that may
                    happen in some use cases.
                  I tell you my use case which is quite different.
                    We use ceph under proxmox. The servers have disks on
                    raid 5 (I agree that it is better to expose single
                    disks to Ceph but it is late). 
                  So it is unlikely that a ceph disk fails because
                    of raid. If a disks fail probabably is because the
                    entire server has failed (and we need to provide
                    business availability in this case) and so it will
                    never come up again so in my situation your sequence
                    of events will never happen.
                  What shocked me is that I did not expect to see
                    so many inconsistencies.
                  Thanks,
                  Mario

                  2017-11-01 16:45 GMT+01:00
                    David Turner <drakonstein@xxxxxxxxx>:

                      It looks like you're running with a
                        size = 2 and min_size = 1 (the min_size is a
                        guess, the size is based on how many osds belong
                        to your problem PGs).  Here's some good reading
                        for you.  https://www.spinics.net/lists/ceph-users/msg32895.html

                        Basically the jist is that when running
                          with size = 2 you should assume that data loss
                          is an eventuality and choose that it is ok for
                          your use case.  This can be mitigated by using
                          min_size = 2, but then your pool will block
                          while an OSD is down and you'll have to
                          manually go in and change the min_size
                          temporarily to perform maintenance.

                        All it takes for data loss is that an osd
                          on server 1 is marked down and a write happens
                          to an osd on server 2.  Now the osd on server
                          2 goes down before the osd on server 1 has
                          finished backfilling and the first osd
                          receives a request to modify data in the
                          object that it doesn't know the current state
                          of.  Tada, you have data loss.

                        How likely is this to happen... eventually
                          it will.  PG subfolder splitting (if you're
                          using filestore) will occasionally take long
                          enough to perform the task that the osd is
                          marked down while it's still running, and this
                          usually happens for some time all over the
                          cluster when it does.  Another option is
                          something that causes segfaults in the osds;
                          another is restarting a node before all pgs
                          are done backfilling/recovering; OOM killer;
                          power outages; etc; etc.

                        Why does min_size = 2 prevent this? 
                          Because for a write to be acknowledged by the
                          cluster, it has to be written to every OSD
                          that is up as long as there are at least
                          min_size available.  This means that every
                          write is acknowledged by at least 2 osds every
                          time.  If you're running with size = 2, then
                          both copies of the data need to be online for
                          a write to happen and thus can never have a
                          write that the other does not.  If you're
                          running with size = 3, then you always have a
                          majority of the OSDs online receiving a write
                          and they can both agree on the correct data to
                          give to the third when it comes back up.

                        On Wed, Nov 1, 2017 at 3:31 AM
                          Mario Giammarco <mgiammarco@xxxxxxxxx>
                          wrote:

                          Sure here it is ceph -s:

                            cluster: 

                                   id:
                                    8bc45d9a-ef50-4038-8e1b-1f25ac46c945

                                   health: HEALTH_ERR 

                                           100 scrub errors 

                                           Possible data damage: 56 pgs
                                inconsistent 

                                 services: 

                                   mon: 3 daemons, quorum 0,1,pve3 

                                   mgr: pve3(active) 

                                   osd: 3 osds: 3 up, 3 in 

                                 data: 

                                   pools:   1 pools, 256 pgs 

                                   objects: 269k objects, 1007 GB 

                                   usage:   2050 GB used, 1386 GB / 3436
                                GB avail 

                                   pgs:     200 active+clean 

                                            56
                                 active+clean+inconsistent 

                            ---

                            ceph health
                                detail :

                            PG_DAMAGED
                                  Possible data damage: 56 pgs
                                  inconsistent 

                                   pg 2.6 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.19 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.1e is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.1f is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.24 is active+clean+inconsistent,
                                acting [0,2] 

                                   pg 2.25 is active+clean+inconsistent,
                                acting [2,0] 

                                   pg 2.36 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.3d is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.4b is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.4c is active+clean+inconsistent,
                                acting [0,2] 

                                   pg 2.4d is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.4f is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.50 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.52 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.56 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.5b is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.5c is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.5d is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.5f is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.71 is active+clean+inconsistent,
                                acting [0,2] 

                                   pg 2.75 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.77 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.79 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.7e is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.83 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.8a is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.92 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.98 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.9a is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.9e is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.9f is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.c6 is active+clean+inconsistent,
                                acting [0,2] 

                                   pg 2.c7 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.c8 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.cb is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.cd
                                is active+clean+inconsistent, acting
                                [1,2] 

                                   pg 2.ce is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.d2 is active+clean+inconsistent,
                                acting [2,1] 

                                   pg 2.da is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.de
                                is active+clean+inconsistent, acting
                                [1,2] 

                                   pg 2.e1 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.e4 is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.e6 is active+clean+inconsistent,
                                acting [0,2] 

                                   pg 2.e8 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.ee
                                is active+clean+inconsistent, acting
                                [1,0] 

                                   pg 2.f9 is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.fa is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.fb is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.fc is active+clean+inconsistent,
                                acting [1,2] 

                                   pg 2.fe is active+clean+inconsistent,
                                acting [1,0] 

                                   pg 2.ff is active+clean+inconsistent,
                                acting [1,0]

                            and
                                ceph pg 2.6 query: 

                            { 

                                   "state": "active+clean+inconsistent",

                                   "snap_trimq": "[]", 

                                   "epoch": 1513, 

                                   "up": [ 

                                       1, 

                                       0 

                                   ], 

                                   "acting": [ 

                                       1, 

                                       0 

                                   ], 

                                   "actingbackfill": [ 

                                       "0", 

                                       "1" 

                                   ], 

                                   "info": { 

                                       "pgid": "2.6", 

                                       "last_update": "1513'89145", 

                                       "last_complete": "1513'89145", 

                                       "log_tail": "1503'87586", 

                                       "last_user_version": 330583, 

                                       "last_backfill": "MAX", 

                                       "last_backfill_bitwise": 0, 

                                       "purged_snaps": [ 

                                           { 

                                               "start": "1", 

                                               "length": "178" 

                                           }, 

                                           { 

                                               "start": "17a", 

                                               "length": "3d" 

                                           }, 

                                           { 

                                               "start": "1b8", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "1ba", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "1bc", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "1be", 

                                               "length": "44" 

                                           }, 

                                           { 

                                               "start": "205", 

                                               "length": "12c" 

                                           }, 

                                           { 

                                               "start": "332", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "334", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "336", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "338", 

                                               "length": "1" 

                                           }, 

                                           { 

                                               "start": "33a", 

                                               "length": "1" 

                                           } 

                                       ], 

                                       "history": { 

                                           "epoch_created": 90, 

                                           "epoch_pool_created": 90, 

                                           "last_epoch_started": 1339, 

                                           "last_interval_started":
                                1338, 

                                           "last_epoch_clean": 1339, 

                                           "last_interval_clean": 1338,

                                           "last_epoch_split": 0, 

                                           "last_epoch_marked_full": 0,

                                           "same_up_since": 1338, 

                                           "same_interval_since": 1338,

                                           "same_primary_since": 1338, 

                                           "last_scrub": "1513'89112", 

                                           "last_scrub_stamp":
                                "2017-11-01 05:52:21.259654", 

                                           "last_deep_scrub":
                                "1513'89112", 

                                           "last_deep_scrub_stamp":
                                "2017-11-01 05:52:21.259654", 

                                           "last_clean_scrub_stamp":
                                "2017-10-25 04:25:09.830840" 

                                       }, 

                                       "stats": { 

                                           "version": "1513'89145", 

                                           "reported_seq": "422820", 

                                           "reported_epoch": "1513", 

                                           "state":
                                "active+clean+inconsistent", 

                                           "last_fresh": "2017-11-01
                                08:11:38.411784", 

                                           "last_change": "2017-11-01
                                05:52:21.259789", 

                                           "last_active": "2017-11-01
                                08:11:38.411784", 

                                           "last_peered": "2017-11-01
                                08:11:38.411784", 

                                           "last_clean": "2017-11-01
                                08:11:38.411784", 

                                           "last_became_active":
                                "2017-10-15 20:36:33.644567", 

                                           "last_became_peered":
                                "2017-10-15 20:36:33.644567", 

                                           "last_unstale": "2017-11-01
                                08:11:38.411784", 

                                           "last_undegraded":
                                "2017-11-01 08:11:38.411784", 

                                           "last_fullsized": "2017-11-01
                                08:11:38.411784", 

                                           "mapping_epoch": 1338, 

                                           "log_start": "1503'87586", 

                                           "ondisk_log_start":
                                "1503'87586", 

                                           "created": 90, 

                                           "last_epoch_clean": 1339, 

                                           "parent": "0.0", 

                                           "parent_split_bits": 0, 

                                           "last_scrub": "1513'89112", 

                                           "last_scrub_stamp":
                                "2017-11-01 05:52:21.259654", 

                                           "last_deep_scrub":
                                "1513'89112", 

                                           "last_deep_scrub_stamp":
                                "2017-11-01 05:52:21.259654", 

                                           "last_clean_scrub_stamp":
                                "2017-10-25 04:25:09.830840", 

                                           "log_size": 1559, 

                                           "ondisk_log_size": 1559, 

                                           "stats_invalid": false, 

                                           "dirty_stats_invalid": false,

                                           "omap_stats_invalid": false,

                                           "hitset_stats_invalid":
                                false, 

                                           "hitset_bytes_stats_invalid":
                                false, 

                                           "pin_stats_invalid": false, 

                                           "stat_sum": { 

                                               "num_bytes": 3747886080,

                                               "num_objects": 958, 

                                               "num_object_clones": 295,

                                               "num_object_copies":
                                1916, 

               "num_objects_missing_on_primary": 0, 

                                               "num_objects_missing": 0,

                                               "num_objects_degraded":
                                0, 

                                               "num_objects_misplaced":
                                0, 

                                               "num_objects_unfound": 0,

                                               "num_objects_dirty": 958,

                                               "num_whiteouts": 0, 

                                               "num_read": 333428, 

                                               "num_read_kb": 135550185,

                                               "num_write": 79221, 

                                               "num_write_kb": 13441239,

                                               "num_scrub_errors": 1, 

               "num_shallow_scrub_errors": 0, 

                                               "num_deep_scrub_errors":
                                1, 

                                               "num_objects_recovered":
                                245, 

                                               "num_bytes_recovered":
                                1012833792, 

                                               "num_keys_recovered": 6,

                                               "num_objects_omap": 0, 

               "num_objects_hit_set_archive": 0, 

               "num_bytes_hit_set_archive": 0, 

                                               "num_flush": 0, 

                                               "num_flush_kb": 0, 

                                               "num_evict": 0, 

                                               "num_evict_kb": 0, 

                                               "num_promote": 0, 

                                               "num_flush_mode_high": 0,

                                               "num_flush_mode_low": 0,

                                               "num_evict_mode_some": 0,

                                               "num_evict_mode_full": 0,

                                               "num_objects_pinned": 0,

                                               "num_legacy_snapsets": 0

                                           }, 

                                           "up": [ 

                                               1, 

                                               0 

                                           ], 

                                           "acting": [ 

                                               1, 

                                               0 

                                           ], 

                                           "blocked_by": [], 

                                           "up_primary": 1, 

                                           "acting_primary": 1 

                                       }, 

                                       "empty": 0, 

                                       "dne": 0, 

                                       "incomplete": 0, 

                                       "last_epoch_started": 1339, 

                                       "hit_set_history": { 

                                           "current_last_update": "0'0",

                                           "history": [] 

                                       } 

                                   }, 

                                   "peer_info": [ 

                                       { 

                                           "peer": "0", 

                                           "pgid": "2.6", 

                                           "last_update": "1513'89145",

                                           "last_complete":
                                "1513'89145", 

                                           "log_tail": "1274'68440", 

                                           "last_user_version": 315687,

                                           "last_backfill": "MAX", 

                                           "last_backfill_bitwise": 0, 

                                           "purged_snaps": [ 

                                               { 

                                                   "start": "1", 

                                                   "length": "178" 

                                               }, 

                                               { 

                                                   "start": "17a", 

                                                   "length": "3d" 

                                               }, 

                                               { 

                                                   "start": "1b8", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "1ba", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "1bc", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "1be", 

                                                   "length": "44" 

                                               }, 

                                               { 

                                                   "start": "205", 

                                                   "length": "82" 

                                               }, 

                                               { 

                                                   "start": "288", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "28a", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "28c", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "28e", 

                                                   "length": "1" 

                                               }, 

                                               { 

                                                   "start": "290", 

                                                   "length": "1" 

                                               } 

                                           ], 

                                           "history": { 

                                               "epoch_created": 90, 

                                               "epoch_pool_created": 90,

                                               "last_epoch_started":
                                1339, 

                                               "last_interval_started":
                                1338, 

                                               "last_epoch_clean": 1339,

                                               "last_interval_clean":
                                1338, 

                                               "last_epoch_split": 0, 

                                               "last_epoch_marked_full":
                                0, 

                                               "same_up_since": 1338, 

                                               "same_interval_since":
                                1338, 

                                               "same_primary_since":
                                1338, 

                                               "last_scrub":
                                "1513'89112", 

                                               "last_scrub_stamp":
                                "2017-11-01 05:52:21.259654", 

                                               "last_deep_scrub":
                                "1513'89112", 

                                               "last_deep_scrub_stamp":
                                "2017-11-01 05:52:21.259654", 

                                               "last_clean_scrub_stamp":
                                "2017-10-25 04:25:09.830840" 

                                           }, 

                                           "stats": { 

                                               "version": "1337'71465",

                                               "reported_seq": "347015",

                                               "reported_epoch": "1338",

                                               "state":
                                "active+undersized+degraded", 

                                               "last_fresh": "2017-10-15
                                20:35:36.930611", 

                                               "last_change":
                                "2017-10-15 20:30:35.752042", 

                                               "last_active":
                                "2017-10-15 20:35:36.930611", 

                                               "last_peered":
                                "2017-10-15 20:35:36.930611", 

                                               "last_clean": "2017-10-15
                                20:30:01.443288", 

                                               "last_became_active":
                                "2017-10-15 20:30:35.752042", 

                                               "last_became_peered":
                                "2017-10-15 20:30:35.752042", 

                                               "last_unstale":
                                "2017-10-15 20:35:36.930611", 

                                               "last_undegraded":
                                "2017-10-15 20:30:35.749043", 

                                               "last_fullsized":
                                "2017-10-15 20:30:35.749043", 

                                               "mapping_epoch": 1338, 

                                               "log_start":
                                "1274'68440", 

                                               "ondisk_log_start":
                                "1274'68440", 

                                               "created": 90, 

                                               "last_epoch_clean": 1331,

                                               "parent": "0.0", 

                                               "parent_split_bits": 0, 

                                               "last_scrub":
                                "1294'71370", 

                                               "last_scrub_stamp":
                                "2017-10-15 09:27:31.756027", 

                                               "last_deep_scrub":
                                "1284'70813", 

                                               "last_deep_scrub_stamp":
                                "2017-10-14 06:35:57.556773", 

                                               "last_clean_scrub_stamp":
                                "2017-10-15 09:27:31.756027", 

                                               "log_size": 3025, 

                                               "ondisk_log_size": 3025,

                                               "stats_invalid": false, 

                                               "dirty_stats_invalid":
                                false, 

                                               "omap_stats_invalid":
                                false, 

                                               "hitset_stats_invalid":
                                false, 

               "hitset_bytes_stats_invalid": false, 

                                               "pin_stats_invalid":
                                false, 

                                               "stat_sum": { 

                                                   "num_bytes": 3555027456,

                                                   "num_objects": 917, 

                                                   "num_object_clones":
                                255, 

                                                   "num_object_copies":
                                1834, 

                   "num_objects_missing_on_primary": 0, 

                   "num_objects_missing": 0, 

                   "num_objects_degraded": 917, 

                   "num_objects_misplaced": 0, 

                   "num_objects_unfound": 0, 

                                                   "num_objects_dirty":
                                917, 

                                                   "num_whiteouts": 0, 

                                                   "num_read": 275095, 

                                                   "num_read_kb":
                                111713846, 

                                                   "num_write": 64324, 

                                                   "num_write_kb":
                                11365374, 

                                                   "num_scrub_errors":
                                0, 

                   "num_shallow_scrub_errors": 0, 

                   "num_deep_scrub_errors": 0, 

                   "num_objects_recovered": 243, 

                   "num_bytes_recovered": 1008594432, 

                                                   "num_keys_recovered":
                                6, 

                                                   "num_objects_omap":
                                0, 

                   "num_objects_hit_set_archive": 0, 

                   "num_bytes_hit_set_archive": 0, 

                                                   "num_flush": 0, 

                                                   "num_flush_kb": 0, 

                                                   "num_evict": 0, 

                                                   "num_evict_kb": 0, 

                                                   "num_promote": 0, 

                   "num_flush_mode_high": 0, 

                                                   "num_flush_mode_low":
                                0, 

                   "num_evict_mode_some": 0, 

                   "num_evict_mode_full": 0, 

                                                   "num_objects_pinned":
                                0, 

                   "num_legacy_snapsets": 0 

                                               }, 

                                               "up": [ 

                                                   1, 

                                                   0 

                                               ], 

                                               "acting": [ 

                                                   1, 

                                                   0 

                                               ], 

                                               "blocked_by": [], 

                                               "up_primary": 1, 

                                               "acting_primary": 1 

                                           }, 

                                           "empty": 0, 

                                           "dne": 0, 

                                           "incomplete": 0, 

                                           "last_epoch_started": 1339, 

                                           "hit_set_history": { 

                                               "current_last_update":
                                "0'0", 

                                               "history": [] 

                                           } 

                                       } 

                                   ], 

                                   "recovery_state": [ 

                                       { 

                                           "name":
                                "Started/Primary/Active", 

                                           "enter_time": "2017-10-15
                                20:36:33.574915", 

                                           "might_have_unfound": [ 

                                               { 

                                                   "osd": "0", 

                                                   "status": "already
                                probed" 

                                               } 

                                           ], 

                                           "recovery_progress": { 

                                               "backfill_targets": [], 

                                               "waiting_on_backfill":
                                [], 

                                               "last_backfill_started":
                                "MIN", 

                                               "backfill_info": { 

                                                   "begin": "MIN", 

                                                   "end": "MIN", 

                                                   "objects": [] 

                                               }, 

                                               "peer_backfill_info": [],

                                               "backfills_in_flight":
                                [], 

                                               "recovering": [], 

                                               "pg_backend": { 

                                                   "pull_from_peer": [],

                                                   "pushing": [] 

                                               } 

                                           }, 

                                           "scrub": { 

                                               "scrubber.epoch_start":
                                "1338", 

                                               "scrubber.active": false,

                                               "scrubber.state":
                                "INACTIVE", 

                                               "scrubber.start": "MIN",

                                               "scrubber.end": "MIN", 

               "scrubber.subset_last_update": "0'0", 

                                               "scrubber.deep": false, 

                                               "scrubber.seed": 0, 

                                               "scrubber.waiting_on": 0,

               "scrubber.waiting_on_whom": [] 

                                           } 

                                       }, 

                                       { 

                                           "name": "Started", 

                                           "enter_time": "2017-10-15
                                20:36:32.592892" 

                                       } 

                                   ], 

                                   "agent_state": {} 

                                }

                            2017-10-30 23:30
                              GMT+01:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:

                                You'll need to tell us
                                  exactly what error messages you're
                                  seeing, what the output of ceph -s is,
                                  and the output of pg query for the
                                  relevant PGs.

                                  There's not a lot of documentation
                                  because much of this tooling is new,
                                  it's changing quickly, and most people
                                  don't have the kinds of problems that
                                  turn out to be unrepairable. We should
                                  do better about that, though.

                                  -Greg

                                  On Mon, Oct 30, 2017,
                                    11:40 AM Mario Giammarco <mgiammarco@xxxxxxxxx>
                                    wrote:

                                   >[Questions
                                    to the list]

                                     >How is it possible that the
                                    cluster cannot repair itself with
                                    ceph pg

                                    repair?

                                     >No good copies are remaining?

                                     >Cannot decide which copy is
                                    valid or up-to date?

                                     >If so, why not, when there is
                                    checksum, mtime for everything?

                                     >In this inconsistent state
                                    which object does the cluster serve
                                    when it

                                    doesn't know which one is the valid?

                                    I am asking the same questions too,
                                    it seems strange to me that in a

                                    fault tolerant clustered file
                                    storage like Ceph there is no

                                    documentation about this.

                                    I know that I am pedantic but please
                                    note that saying "to be sure use

                                    three copies" is not enough because
                                    I am not sure what Ceph really does

                                    when three copies are not matching.

_______________________________________________

                                    ceph-users mailing list

                                    ceph-users@xxxxxxxxxxxxxx

                                    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

                          ceph-users mailing list

                          ceph-users@xxxxxxxxxxxxxx

                          http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

                _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com