Re: Ceph pg active+clean+inconsistent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the late.

Are you still facing inconsistent pg status?

On Wed, Jan 4, 2017 at 11:39 PM, Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxx> wrote:
> # ceph pg debug unfound_objects_exist
> FALSE
>
> Andras
>
>
> On 01/03/2017 11:38 PM, Shinobu Kinjo wrote:
>>
>> Would you run:
>>
>>   # ceph pg debug unfound_objects_exist
>>
>> On Wed, Jan 4, 2017 at 5:31 AM, Andras Pataki
>> <apataki@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Here is the output of ceph pg query for one of hte
>>> active+clean+inconsistent
>>> PGs:
>>>
>>> {
>>>      "state": "active+clean+inconsistent",
>>>      "snap_trimq": "[]",
>>>      "epoch": 342982,
>>>      "up": [
>>>          319,
>>>          90,
>>>          51
>>>      ],
>>>      "acting": [
>>>          319,
>>>          90,
>>>          51
>>>      ],
>>>      "actingbackfill": [
>>>          "51",
>>>          "90",
>>>          "319"
>>>      ],
>>>      "info": {
>>>          "pgid": "6.92c",
>>>          "last_update": "342982'41304",
>>>          "last_complete": "342982'41304",
>>>          "log_tail": "342980'38259",
>>>          "last_user_version": 41304,
>>>          "last_backfill": "MAX",
>>>          "last_backfill_bitwise": 0,
>>>          "purged_snaps": "[]",
>>>          "history": {
>>>              "epoch_created": 262553,
>>>              "last_epoch_started": 342598,
>>>              "last_epoch_clean": 342613,
>>>              "last_epoch_split": 0,
>>>              "last_epoch_marked_full": 0,
>>>              "same_up_since": 342596,
>>>              "same_interval_since": 342597,
>>>              "same_primary_since": 342597,
>>>              "last_scrub": "342982'41177",
>>>              "last_scrub_stamp": "2017-01-02 18:19:48.081750",
>>>              "last_deep_scrub": "342965'37465",
>>>              "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
>>>              "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816"
>>>          },
>>>          "stats": {
>>>              "version": "342982'41304",
>>>              "reported_seq": "43600",
>>>              "reported_epoch": "342982",
>>>              "state": "active+clean+inconsistent",
>>>              "last_fresh": "2017-01-03 15:27:15.075176",
>>>              "last_change": "2017-01-02 18:19:48.081806",
>>>              "last_active": "2017-01-03 15:27:15.075176",
>>>              "last_peered": "2017-01-03 15:27:15.075176",
>>>              "last_clean": "2017-01-03 15:27:15.075176",
>>>              "last_became_active": "2016-11-01 16:21:23.328639",
>>>              "last_became_peered": "2016-11-01 16:21:23.328639",
>>>              "last_unstale": "2017-01-03 15:27:15.075176",
>>>              "last_undegraded": "2017-01-03 15:27:15.075176",
>>>              "last_fullsized": "2017-01-03 15:27:15.075176",
>>>              "mapping_epoch": 342596,
>>>              "log_start": "342980'38259",
>>>              "ondisk_log_start": "342980'38259",
>>>              "created": 262553,
>>>              "last_epoch_clean": 342613,
>>>              "parent": "0.0",
>>>              "parent_split_bits": 0,
>>>              "last_scrub": "342982'41177",
>>>              "last_scrub_stamp": "2017-01-02 18:19:48.081750",
>>>              "last_deep_scrub": "342965'37465",
>>>              "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
>>>              "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816",
>>>              "log_size": 3045,
>>>              "ondisk_log_size": 3045,
>>>              "stats_invalid": false,
>>>              "dirty_stats_invalid": false,
>>>              "omap_stats_invalid": false,
>>>              "hitset_stats_invalid": false,
>>>              "hitset_bytes_stats_invalid": false,
>>>              "pin_stats_invalid": true,
>>>              "stat_sum": {
>>>                  "num_bytes": 16929346269,
>>>                  "num_objects": 4881,
>>>                  "num_object_clones": 0,
>>>                  "num_object_copies": 14643,
>>>                  "num_objects_missing_on_primary": 0,
>>>                  "num_objects_missing": 0,
>>>                  "num_objects_degraded": 0,
>>>                  "num_objects_misplaced": 0,
>>>                  "num_objects_unfound": 0,
>>>                  "num_objects_dirty": 4881,
>>>                  "num_whiteouts": 0,
>>>                  "num_read": 7592,
>>>                  "num_read_kb": 19593996,
>>>                  "num_write": 42541,
>>>                  "num_write_kb": 47306915,
>>>                  "num_scrub_errors": 1,
>>>                  "num_shallow_scrub_errors": 1,
>>>                  "num_deep_scrub_errors": 0,
>>>                  "num_objects_recovered": 5807,
>>>                  "num_bytes_recovered": 22691211916,
>>>                  "num_keys_recovered": 0,
>>>                  "num_objects_omap": 0,
>>>                  "num_objects_hit_set_archive": 0,
>>>                  "num_bytes_hit_set_archive": 0,
>>>                  "num_flush": 0,
>>>                  "num_flush_kb": 0,
>>>                  "num_evict": 0,
>>>                  "num_evict_kb": 0,
>>>                  "num_promote": 0,
>>>                  "num_flush_mode_high": 0,
>>>                  "num_flush_mode_low": 0,
>>>                  "num_evict_mode_some": 0,
>>>                  "num_evict_mode_full": 0,
>>>                  "num_objects_pinned": 0
>>>              },
>>>              "up": [
>>>                  319,
>>>                  90,
>>>                  51
>>>              ],
>>>              "acting": [
>>>                  319,
>>>                  90,
>>>                  51
>>>              ],
>>>              "blocked_by": [],
>>>              "up_primary": 319,
>>>              "acting_primary": 319
>>>          },
>>>          "empty": 0,
>>>          "dne": 0,
>>>          "incomplete": 0,
>>>          "last_epoch_started": 342598,
>>>          "hit_set_history": {
>>>              "current_last_update": "0'0",
>>>              "history": []
>>>          }
>>>      },
>>>      "peer_info": [
>>>          {
>>>              "peer": "51",
>>>              "pgid": "6.92c",
>>>              "last_update": "342982'41304",
>>>              "last_complete": "342982'41304",
>>>              "log_tail": "341563'12014",
>>>              "last_user_version": 15033,
>>>              "last_backfill": "MAX",
>>>              "last_backfill_bitwise": 0,
>>>              "purged_snaps": "[]",
>>>              "history": {
>>>                  "epoch_created": 262553,
>>>                  "last_epoch_started": 342598,
>>>                  "last_epoch_clean": 342613,
>>>                  "last_epoch_split": 0,
>>>                  "last_epoch_marked_full": 0,
>>>                  "same_up_since": 342596,
>>>                  "same_interval_since": 342597,
>>>                  "same_primary_since": 342597,
>>>                  "last_scrub": "342982'41177",
>>>                  "last_scrub_stamp": "2017-01-02 18:19:48.081750",
>>>                  "last_deep_scrub": "342965'37465",
>>>                  "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
>>>                  "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816"
>>>              },
>>>              "stats": {
>>>                  "version": "342541'15032",
>>>                  "reported_seq": "21472",
>>>                  "reported_epoch": "342597",
>>>                  "state": "active+undersized+degraded",
>>>                  "last_fresh": "2016-11-01 16:05:44.991004",
>>>                  "last_change": "2016-11-01 16:05:44.990630",
>>>                  "last_active": "2016-11-01 16:05:44.991004",
>>>                  "last_peered": "2016-11-01 16:05:44.991004",
>>>                  "last_clean": "2016-11-01 15:26:23.393984",
>>>                  "last_became_active": "2016-11-01 16:05:44.990630",
>>>                  "last_became_peered": "2016-11-01 16:05:44.990630",
>>>                  "last_unstale": "2016-11-01 16:05:44.991004",
>>>                  "last_undegraded": "2016-11-01 16:05:44.021269",
>>>                  "last_fullsized": "2016-11-01 16:05:44.021269",
>>>                  "mapping_epoch": 342596,
>>>                  "log_start": "341563'12014",
>>>                  "ondisk_log_start": "341563'12014",
>>>                  "created": 262553,
>>>                  "last_epoch_clean": 342587,
>>>                  "parent": "0.0",
>>>                  "parent_split_bits": 0,
>>>                  "last_scrub": "342266'14514",
>>>                  "last_scrub_stamp": "2016-10-28 16:41:06.563820",
>>>                  "last_deep_scrub": "342266'14514",
>>>                  "last_deep_scrub_stamp": "2016-10-28 16:41:06.563820",
>>>                  "last_clean_scrub_stamp": "2016-10-28 16:41:06.563820",
>>>                  "log_size": 3018,
>>>                  "ondisk_log_size": 3018,
>>>                  "stats_invalid": false,
>>>                  "dirty_stats_invalid": false,
>>>                  "omap_stats_invalid": false,
>>>                  "hitset_stats_invalid": false,
>>>                  "hitset_bytes_stats_invalid": false,
>>>                  "pin_stats_invalid": true,
>>>                  "stat_sum": {
>>>                      "num_bytes": 12528581359,
>>>                      "num_objects": 3562,
>>>                      "num_object_clones": 0,
>>>                      "num_object_copies": 10683,
>>>                      "num_objects_missing_on_primary": 0,
>>>                      "num_objects_missing": 0,
>>>                      "num_objects_degraded": 3561,
>>>                      "num_objects_misplaced": 0,
>>>                      "num_objects_unfound": 0,
>>>                      "num_objects_dirty": 3562,
>>>                      "num_whiteouts": 0,
>>>                      "num_read": 3678,
>>>                      "num_read_kb": 10197642,
>>>                      "num_write": 15656,
>>>                      "num_write_kb": 19564203,
>>>                      "num_scrub_errors": 0,
>>>                      "num_shallow_scrub_errors": 0,
>>>                      "num_deep_scrub_errors": 0,
>>>                      "num_objects_recovered": 5806,
>>>                      "num_bytes_recovered": 22687335556,
>>>                      "num_keys_recovered": 0,
>>>                      "num_objects_omap": 0,
>>>                      "num_objects_hit_set_archive": 0,
>>>                      "num_bytes_hit_set_archive": 0,
>>>                      "num_flush": 0,
>>>                      "num_flush_kb": 0,
>>>                      "num_evict": 0,
>>>                      "num_evict_kb": 0,
>>>                      "num_promote": 0,
>>>                      "num_flush_mode_high": 0,
>>>                      "num_flush_mode_low": 0,
>>>                      "num_evict_mode_some": 0,
>>>                      "num_evict_mode_full": 0,
>>>                      "num_objects_pinned": 0
>>>                  },
>>>                  "up": [
>>>                      319,
>>>                      90,
>>>                      51
>>>                  ],
>>>                  "acting": [
>>>                      319,
>>>                      90,
>>>                      51
>>>                  ],
>>>                  "blocked_by": [],
>>>                  "up_primary": 319,
>>>                  "acting_primary": 319
>>>              },
>>>              "empty": 0,
>>>              "dne": 0,
>>>              "incomplete": 0,
>>>              "last_epoch_started": 342598,
>>>              "hit_set_history": {
>>>                  "current_last_update": "0'0",
>>>                  "history": []
>>>              }
>>>          },
>>>          {
>>>              "peer": "90",
>>>              "pgid": "6.92c",
>>>              "last_update": "342982'41304",
>>>              "last_complete": "342982'41304",
>>>              "log_tail": "341563'12014",
>>>              "last_user_version": 15033,
>>>              "last_backfill": "MAX",
>>>              "last_backfill_bitwise": 0,
>>>              "purged_snaps": "[]",
>>>              "history": {
>>>                  "epoch_created": 262553,
>>>                  "last_epoch_started": 342598,
>>>                  "last_epoch_clean": 342613,
>>>                  "last_epoch_split": 0,
>>>                  "last_epoch_marked_full": 0,
>>>                  "same_up_since": 342596,
>>>                  "same_interval_since": 342597,
>>>                  "same_primary_since": 342597,
>>>                  "last_scrub": "342982'41177",
>>>                  "last_scrub_stamp": "2017-01-02 18:19:48.081750",
>>>                  "last_deep_scrub": "342965'37465",
>>>                  "last_deep_scrub_stamp": "2016-12-20 16:31:06.438823",
>>>                  "last_clean_scrub_stamp": "2016-12-11 12:51:19.258816"
>>>              },
>>>              "stats": {
>>>                  "version": "342589'15033",
>>>                  "reported_seq": "21478",
>>>                  "reported_epoch": "342596",
>>>                  "state": "remapped+peering",
>>>                  "last_fresh": "2016-11-01 16:21:20.584113",
>>>                  "last_change": "2016-11-01 16:21:20.295685",
>>>                  "last_active": "2016-11-01 16:14:02.694748",
>>>                  "last_peered": "2016-11-01 16:14:02.694748",
>>>                  "last_clean": "2016-11-01 15:26:23.393984",
>>>                  "last_became_active": "2016-11-01 16:05:44.990630",
>>>                  "last_became_peered": "2016-11-01 16:05:44.990630",
>>>                  "last_unstale": "2016-11-01 16:21:20.584113",
>>>                  "last_undegraded": "2016-11-01 16:21:20.584113",
>>>                  "last_fullsized": "2016-11-01 16:21:20.584113",
>>>                  "mapping_epoch": 342596,
>>>                  "log_start": "341563'12014",
>>>                  "ondisk_log_start": "341563'12014",
>>>                  "created": 262553,
>>>                  "last_epoch_clean": 342587,
>>>                  "parent": "0.0",
>>>                  "parent_split_bits": 0,
>>>                  "last_scrub": "342266'14514",
>>>                  "last_scrub_stamp": "2016-10-28 16:41:06.563820",
>>>                  "last_deep_scrub": "342266'14514",
>>>                  "last_deep_scrub_stamp": "2016-10-28 16:41:06.563820",
>>>                  "last_clean_scrub_stamp": "2016-10-28 16:41:06.563820",
>>>                  "log_size": 3019,
>>>                  "ondisk_log_size": 3019,
>>>                  "stats_invalid": false,
>>>                  "dirty_stats_invalid": false,
>>>                  "omap_stats_invalid": false,
>>>                  "hitset_stats_invalid": false,
>>>                  "hitset_bytes_stats_invalid": false,
>>>                  "pin_stats_invalid": true,
>>>                  "stat_sum": {
>>>                      "num_bytes": 12528581359,
>>>                      "num_objects": 3562,
>>>                      "num_object_clones": 0,
>>>                      "num_object_copies": 10686,
>>>                      "num_objects_missing_on_primary": 0,
>>>                      "num_objects_missing": 0,
>>>                      "num_objects_degraded": 0,
>>>                      "num_objects_misplaced": 0,
>>>                      "num_objects_unfound": 0,
>>>                      "num_objects_dirty": 3562,
>>>                      "num_whiteouts": 0,
>>>                      "num_read": 3678,
>>>                      "num_read_kb": 10197642,
>>>                      "num_write": 15656,
>>>                      "num_write_kb": 19564203,
>>>                      "num_scrub_errors": 0,
>>>                      "num_shallow_scrub_errors": 0,
>>>                      "num_deep_scrub_errors": 0,
>>>                      "num_objects_recovered": 5806,
>>>                      "num_bytes_recovered": 22687335556,
>>>                      "num_keys_recovered": 0,
>>>                      "num_objects_omap": 0,
>>>                      "num_objects_hit_set_archive": 0,
>>>                      "num_bytes_hit_set_archive": 0,
>>>                      "num_flush": 0,
>>>                      "num_flush_kb": 0,
>>>                      "num_evict": 0,
>>>                      "num_evict_kb": 0,
>>>                      "num_promote": 0,
>>>                      "num_flush_mode_high": 0,
>>>                      "num_flush_mode_low": 0,
>>>                      "num_evict_mode_some": 0,
>>>                      "num_evict_mode_full": 0,
>>>                      "num_objects_pinned": 0
>>>                  },
>>>                  "up": [
>>>                      319,
>>>                      90,
>>>                      51
>>>                  ],
>>>                  "acting": [
>>>                      319,
>>>                      90,
>>>                      51
>>>                  ],
>>>                  "blocked_by": [],
>>>                  "up_primary": 319,
>>>                  "acting_primary": 319
>>>              },
>>>              "empty": 0,
>>>              "dne": 0,
>>>              "incomplete": 0,
>>>              "last_epoch_started": 342598,
>>>              "hit_set_history": {
>>>                  "current_last_update": "0'0",
>>>                  "history": []
>>>              }
>>>          }
>>>      ],
>>>      "recovery_state": [
>>>          {
>>>              "name": "Started\/Primary\/Active",
>>>              "enter_time": "2016-11-01 16:21:23.007072",
>>>              "might_have_unfound": [
>>>                  {
>>>                      "osd": "51",
>>>                      "status": "already probed"
>>>                  },
>>>                  {
>>>                      "osd": "90",
>>>                      "status": "already probed"
>>>                  }
>>>              ],
>>>              "recovery_progress": {
>>>                  "backfill_targets": [],
>>>                  "waiting_on_backfill": [],
>>>                  "last_backfill_started": "MIN",
>>>                  "backfill_info": {
>>>                      "begin": "MIN",
>>>                      "end": "MIN",
>>>                      "objects": []
>>>                  },
>>>                  "peer_backfill_info": [],
>>>                  "backfills_in_flight": [],
>>>                  "recovering": [],
>>>                  "pg_backend": {
>>>                      "pull_from_peer": [],
>>>                      "pushing": []
>>>                  }
>>>              },
>>>              "scrub": {
>>>                  "scrubber.epoch_start": "342597",
>>>                  "scrubber.active": 0,
>>>                  "scrubber.state": "INACTIVE",
>>>                  "scrubber.start": "MIN",
>>>                  "scrubber.end": "MIN",
>>>                  "scrubber.subset_last_update": "0'0",
>>>                  "scrubber.deep": false,
>>>                  "scrubber.seed": 0,
>>>                  "scrubber.waiting_on": 0,
>>>                  "scrubber.waiting_on_whom": []
>>>              }
>>>          },
>>>          {
>>>              "name": "Started",
>>>              "enter_time": "2016-11-01 16:21:21.763033"
>>>          }
>>>      ],
>>>      "agent_state": {}
>>> }
>>>
>>>
>>> Andras
>>>
>>>
>>>
>>> On 12/23/2016 01:27 AM, Shinobu Kinjo wrote:
>>>>
>>>> Would you be able to execute ``ceph pg ${PG ID} query`` against that
>>>> particular PG?
>>>>
>>>> On Wed, Dec 21, 2016 at 11:44 PM, Andras Pataki
>>>> <apataki@xxxxxxxxxxxxxxxxxxxx> wrote:
>>>>>
>>>>> Yes, size = 3, and I have checked that all three replicas are the same
>>>>> zero
>>>>> length object on the disk.  I think some metadata info is mismatching
>>>>> what
>>>>> the OSD log refers to as "object info size".  But I'm not sure what to
>>>>> do
>>>>> about it.  pg repair does not fix it.  In fact, the file this object
>>>>> corresponds to in CephFS is shorter so this chunk shouldn't even exist
>>>>> I
>>>>> think (details are in the original email).  Although I may be
>>>>> understanding
>>>>> the situation wrong ...
>>>>>
>>>>> Andras
>>>>>
>>>>>
>>>>> On 12/21/2016 07:17 AM, Mehmet wrote:
>>>>>
>>>>> Hi Andras,
>>>>>
>>>>> Iam not the experienced User but i guess you could have a look on this
>>>>> object on each related osd for the pg, compare them and delete the
>>>>> Different
>>>>> object. I assume you have size = 3.
>>>>>
>>>>> Then again pg repair.
>>>>>
>>>>> But be carefull iirc the replica will be recovered from the primary pg.
>>>>>
>>>>> Hth
>>>>>
>>>>> Am 20. Dezember 2016 22:39:44 MEZ, schrieb Andras Pataki
>>>>> <apataki@xxxxxxxxxxxxxxxxxxxx>:
>>>>>>
>>>>>> Hi cephers,
>>>>>>
>>>>>> Any ideas on how to proceed on the inconsistencies below?  At the
>>>>>> moment
>>>>>> our ceph setup has 5 of these - in all cases it seems like some zero
>>>>>> length
>>>>>> objects that match across the three replicas, but do not match the
>>>>>> object
>>>>>> info size.  I tried running pg repair on one of them, but it didn't
>>>>>> repair
>>>>>> the problem:
>>>>>>
>>>>>> 2016-12-20 16:24:40.870307 7f3e1a4b1700  0 log_channel(cluster) log
>>>>>> [INF]
>>>>>> : 6.92c repair starts
>>>>>> 2016-12-20 16:27:06.183186 7f3e1a4b1700 -1 log_channel(cluster) log
>>>>>> [ERR]
>>>>>> : repair 6.92c 6:34932257:::1000187bbb5.00000009:head on disk size (0)
>>>>>> does
>>>>>> not match object info size (3014656) adjusted for ondisk to (3014656)
>>>>>> 2016-12-20 16:27:35.885496 7f3e17cac700 -1 log_channel(cluster) log
>>>>>> [ERR]
>>>>>> : 6.92c repair 1 errors, 0 fixed
>>>>>>
>>>>>>
>>>>>> Any help/hints would be appreciated.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Andras
>>>>>>
>>>>>>
>>>>>> On 12/15/2016 10:13 AM, Andras Pataki wrote:
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Yesterday scrubbing turned up an inconsistency in one of our placement
>>>>>> groups.  We are running ceph 10.2.3, using CephFS and RBD for some VM
>>>>>> images.
>>>>>>
>>>>>> [root@hyperv017 ~]# ceph -s
>>>>>>       cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
>>>>>>        health HEALTH_ERR
>>>>>>               1 pgs inconsistent
>>>>>>               1 scrub errors
>>>>>>               noout flag(s) set
>>>>>>        monmap e15: 3 mons at
>>>>>>
>>>>>>
>>>>>> {hyperv029=10.4.36.179:6789/0,hyperv030=10.4.36.180:6789/0,hyperv031=10.4.36.181:6789/0}
>>>>>>               election epoch 27192, quorum 0,1,2
>>>>>> hyperv029,hyperv030,hyperv031
>>>>>>         fsmap e17181: 1/1/1 up {0=hyperv029=up:active}, 2 up:standby
>>>>>>        osdmap e342930: 385 osds: 385 up, 385 in
>>>>>>               flags noout
>>>>>>         pgmap v37580512: 34816 pgs, 5 pools, 673 TB data, 198 Mobjects
>>>>>>               1583 TB used, 840 TB / 2423 TB avail
>>>>>>                  34809 active+clean
>>>>>>                      4 active+clean+scrubbing+deep
>>>>>>                      2 active+clean+scrubbing
>>>>>>                      1 active+clean+inconsistent
>>>>>>     client io 87543 kB/s rd, 671 MB/s wr, 23 op/s rd, 2846 op/s wr
>>>>>>
>>>>>> # ceph pg dump | grep inconsistent
>>>>>> 6.13f1  4692    0       0       0       0 16057314767     3087    3087
>>>>>> active+clean+inconsistent 2016-12-14 16:49:48.391572      342929'41011
>>>>>> 342929:43966 [158,215,364]   158     [158,215,364]   158
>>>>>> 342928'40540
>>>>>> 2016-12-14 16:49:48.391511      342928'40540    2016-12-14
>>>>>> 16:49:48.391511
>>>>>>
>>>>>> I tried a couple of other deep scrubs on pg 6.13f1 but got repeated
>>>>>> errors.  In the OSD logs:
>>>>>>
>>>>>> 2016-12-14 16:48:07.733291 7f3b56e3a700 -1 log_channel(cluster) log
>>>>>> [ERR]
>>>>>> : deep-scrub 6.13f1 6:8fc91b77:::1000187bb70.00000009:head on disk
>>>>>> size
>>>>>> (0)
>>>>>> does not match object info size (1835008) adjusted for ondisk to
>>>>>> (1835008)
>>>>>> I looked at the objects on the 3 OSD's on their respective hosts and
>>>>>> they
>>>>>> are the same, zero length files:
>>>>>>
>>>>>> # cd ~ceph/osd/ceph-158/current/6.13f1_head
>>>>>> # find . -name *1000187bb70* -ls
>>>>>> 669738    0 -rw-r--r--   1 ceph     ceph            0 Dec 13 17:00
>>>>>> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.00000009__head_EED893F1__6
>>>>>>
>>>>>> # cd ~ceph/osd/ceph-215/current/6.13f1_head
>>>>>> # find . -name *1000187bb70* -ls
>>>>>> 539815647 0 -rw-r--r--   1 ceph     ceph            0 Dec 13 17:00
>>>>>> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.00000009__head_EED893F1__6
>>>>>>
>>>>>> # cd ~ceph/osd/ceph-364/current/6.13f1_head
>>>>>> # find . -name *1000187bb70* -ls
>>>>>> 1881432215    0 -rw-r--r--   1 ceph     ceph            0 Dec 13 17:00
>>>>>> ./DIR_1/DIR_F/DIR_3/DIR_9/DIR_8/1000187bb70.00000009__head_EED893F1__6
>>>>>>
>>>>>> At the time of the write, there wasn't anything unusual going on as
>>>>>> far
>>>>>> as
>>>>>> I can tell (no hardware/network issues, all processes were up, etc).
>>>>>>
>>>>>> This pool is a CephFS data pool, and the corresponding file (inode hex
>>>>>> 1000187bb70, decimal 1099537300336) looks like this:
>>>>>>
>>>>>> # ls -li chr4.tags.tsv
>>>>>> 1099537300336 -rw-r--r-- 1 xichen xichen 14469915 Dec 13 17:01
>>>>>> chr4.tags.tsv
>>>>>>
>>>>>> Reading the file is also ok (no errors, right number of bytes):
>>>>>> # cat chr4.tags.tsv > /dev/null
>>>>>> # wc chr4.tags.tsv
>>>>>>     592251  2961255 14469915 chr4.tags.tsv
>>>>>>
>>>>>> We are using the standard 4MB block size for CephFS, and if I
>>>>>> interpret
>>>>>> this right, this is the 9th chunk, so there shouldn't be any data (or
>>>>>> even a
>>>>>> 9th chunk), since the file is only 14MB.  Should I run pg repair on
>>>>>> this?
>>>>>> Any ideas on how this could come about? Any other recommendations?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Andras
>>>>>> apataki@xxxxxxxxxxx
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux