Re: MDS: journaler.pq decode error

Benjeman Meekhof <bmeekhof@xxxxxxxxx> · Wed, 20 Jun 2018 09:16:50 -0400

Thanks for the response.  I was also hoping to be able to debug better
once we got onto Mimic.  We just finished that upgrade yesterday and
cephfs-journal-tool does find a corruption in the purge queue though
our MDS continues to startup and the filesystem appears to be
functional as usual.

How can I modify the purge queue to remove damaged sections?  Is there
some way to scan known FS objects and remove any that might now be
orphaned once the damage is removed/repaired?

# cephfs-journal-tool --journal=purge_queue journal inspect

Overall journal integrity: DAMAGED
Corrupt regions:
  0x6819f8-681a55

# cephfs-journal-tool --journal=purge_queue header get

{
    "magic": "ceph fs volume v011",
    "write_pos": 203357732,
    "expire_pos": 6822392,
    "trimmed_pos": 4194304,
    "stream_format": 1,
    "layout": {
        "stripe_unit": 4194304,
        "stripe_count": 1,
        "object_size": 4194304,
        "pool_id": 64,
        "pool_ns": ""
    }
}

thanks,
Ben

On Fri, Jun 15, 2018 at 11:54 AM, John Spray <jspray@xxxxxxxxxx> wrote:
> On Fri, Jun 15, 2018 at 2:55 PM, Benjeman Meekhof <bmeekhof@xxxxxxxxx> wrote:
>> Have seen some posts and issue trackers related to this topic in the
>> past but haven't been able to put it together to resolve the issue I'm
>> having.  All on Luminous 12.2.5 (upgraded over time from past
>> releases).  We are going to upgrade to Mimic near future if that would
>> somehow resolve the issue.
>>
>> Summary:
>>
>> 1.  We have a CephFS data pool which has steadily and slowly grown in
>> size without corresponding writes to the directory placed on it - a
>> plot of usage over a few hours shows a very regular upward rate of
>> increase.   The pool is now 300TB vs 16TB of actual space used in
>> directory.
>>
>> 2.  Reading through some email posts and issue trackers led me to
>> disabling 'standby replay' though we are not and have not ever used
>> snapshots.   Disabling that feature on our 3 MDS stopped the steady
>> climb.  However the pool remains with 300TB of unaccounted for space
>> usage.  http://tracker.ceph.com/issues/19593 and
>> http://tracker.ceph.com/issues/21551
>
> This is pretty strange -- if you were already on 12.2.5 then the
> http://tracker.ceph.com/issues/19593 should have been fixed and
> switching standby replays on/off shouldn't make a difference (unless
> there's some similar bug that crept back into luminous).
>
>> 3.   I've never had any issue starting the MDS or with filesystem
>> functionality but looking through the mds logs I see a single
>> 'journaler.pg(rw) _decode error from assimilate_prefetch' at every
>> startup.  A log snippet with context is below with debug_mds and
>> debug_journaler at 20.
>
> This message suggests that the purge queue has been corrupted, but the
> MDS is ignoring this -- something is wrong with the error handling.
> The MDS should be marked damaged when something like this happens, but
> in this case PurgeQueue is apparently dropping the error on the floor
> after it gets logged by Journaler.  I've opened a ticket+PR for the
> error handling here: http://tracker.ceph.com/issues/24533 (however,
> the loading path in PurgeQueue::_recover *does* have error handling so
> I'm not clear why that isn't happening in your case).
>
> I believe cephfs-journal-tool in mimic was enhanced to be able to
> optionally operate on the purge queue as well as the metadata journal
> (they use the same underlying format), so upgrading to mimic would
> give you better tooling for debugging this.
>
> John
>
>
>> As noted, there is at least one past email thread on the topic but I'm
>> not quite having the same issue as this person and I couldn't glean
>> any information as to what I should do to repair this error and get
>> stale objects purged from this pool (if that is in fact the issue):
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021379.html
>>
>> Any thoughts on troubleshooting steps I could try next?
>>
>> Here is the log snippet:
>>
>> 2018-06-15 09:14:50.746831 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> 2018-06-15 09:14:50.746835 7fb47251b700 10 mds.0.journaler.pq(rw)
>> append_entry len 81 to 88121773~101
>> 2018-06-15 09:14:50.746838 7fb47251b700 10 mds.0.journaler.pq(rw) _prefetch
>> 2018-06-15 09:14:50.746863 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> 2018-06-15 09:14:50.746864 7fb47251b700 10 mds.0.journaler.pq(rw)
>> append_entry len 81 to 88121874~101
>> 2018-06-15 09:14:50.746867 7fb47251b700 10 mds.0.journaler.pq(rw) _prefetch
>> 2018-06-15 09:14:50.746901 7fb46fd16700 10 mds.0.journaler.pq(rw)
>> _finish_read got 6822392~1566216
>> 2018-06-15 09:14:50.746909 7fb46fd16700 10 mds.0.journaler.pq(rw)
>> _assimilate_prefetch 6822392~1566216
>> 2018-06-15 09:14:50.746911 7fb46fd16700 10 mds.0.journaler.pq(rw)
>> _assimilate_prefetch gap of 4194304 from received_pos 8388608 to first
>> prefetched buffer 12582912
>> 2018-06-15 09:14:50.746913 7fb46fd16700 10 mds.0.journaler.pq(rw)
>> _assimilate_prefetch read_buf now 6822392~1566216, read pointers
>> 6822392/8388608/50331648
>>
>> === error here ===> 2018-06-15 09:14:50.746965 7fb46fd16700 -1
>> mds.0.journaler.pq(rw) _decode error from assimilate_prefetch
>>
>> 2018-06-15 09:14:50.746994 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> 2018-06-15 09:14:50.746998 7fb47251b700 10 mds.0.journaler.pq(rw)
>> append_entry len 81 to 88121975~101
>> 2018-06-15 09:14:50.747007 7fb47251b700 10 mds.0.journaler.pq(rw)
>> wait_for_readable at 6822392 onreadable 0x557ee0f58300
>> 2018-06-15 09:14:50.747042 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> 2018-06-15 09:14:50.747043 7fb47251b700 10 mds.0.journaler.pq(rw)
>> append_entry len 81 to 88122076~101
>> 2018-06-15 09:14:50.747063 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> 2018-06-15 09:14:50.747064 7fb47251b700 10 mds.0.journaler.pq(rw)
>> append_entry len 81 to 88122177~101
>> 2018-06-15 09:14:50.747113 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> 2018-06-15 09:14:50.747114 7fb47251b700 10 mds.0.journaler.pq(rw)
>> append_entry len 81 to 88122278~101
>> 2018-06-15 09:14:50.747136 7fb47251b700 20 mds.0.journaler.pq(rw)
>> write_buf_throttle get, delta 101
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com