MDS: journaler.pq decode error

Benjeman Meekhof <bmeekhof@xxxxxxxxx> · Fri, 15 Jun 2018 09:55:19 -0400

Have seen some posts and issue trackers related to this topic in the
past but haven't been able to put it together to resolve the issue I'm
having.  All on Luminous 12.2.5 (upgraded over time from past
releases).  We are going to upgrade to Mimic near future if that would
somehow resolve the issue.

Summary:

1.  We have a CephFS data pool which has steadily and slowly grown in
size without corresponding writes to the directory placed on it - a
plot of usage over a few hours shows a very regular upward rate of
increase.   The pool is now 300TB vs 16TB of actual space used in
directory.

2.  Reading through some email posts and issue trackers led me to
disabling 'standby replay' though we are not and have not ever used
snapshots.   Disabling that feature on our 3 MDS stopped the steady
climb.  However the pool remains with 300TB of unaccounted for space
usage.  http://tracker.ceph.com/issues/19593 and
http://tracker.ceph.com/issues/21551

3.   I've never had any issue starting the MDS or with filesystem
functionality but looking through the mds logs I see a single
'journaler.pg(rw) _decode error from assimilate_prefetch' at every
startup.  A log snippet with context is below with debug_mds and
debug_journaler at 20.

As noted, there is at least one past email thread on the topic but I'm
not quite having the same issue as this person and I couldn't glean
any information as to what I should do to repair this error and get
stale objects purged from this pool (if that is in fact the issue):
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021379.html

Any thoughts on troubleshooting steps I could try next?

Here is the log snippet:

2018-06-15 09:14:50.746831 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
2018-06-15 09:14:50.746835 7fb47251b700 10 mds.0.journaler.pq(rw)
append_entry len 81 to 88121773~101
2018-06-15 09:14:50.746838 7fb47251b700 10 mds.0.journaler.pq(rw) _prefetch
2018-06-15 09:14:50.746863 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
2018-06-15 09:14:50.746864 7fb47251b700 10 mds.0.journaler.pq(rw)
append_entry len 81 to 88121874~101
2018-06-15 09:14:50.746867 7fb47251b700 10 mds.0.journaler.pq(rw) _prefetch
2018-06-15 09:14:50.746901 7fb46fd16700 10 mds.0.journaler.pq(rw)
_finish_read got 6822392~1566216
2018-06-15 09:14:50.746909 7fb46fd16700 10 mds.0.journaler.pq(rw)
_assimilate_prefetch 6822392~1566216
2018-06-15 09:14:50.746911 7fb46fd16700 10 mds.0.journaler.pq(rw)
_assimilate_prefetch gap of 4194304 from received_pos 8388608 to first
prefetched buffer 12582912
2018-06-15 09:14:50.746913 7fb46fd16700 10 mds.0.journaler.pq(rw)
_assimilate_prefetch read_buf now 6822392~1566216, read pointers
6822392/8388608/50331648

=== error here ===> 2018-06-15 09:14:50.746965 7fb46fd16700 -1
mds.0.journaler.pq(rw) _decode error from assimilate_prefetch

2018-06-15 09:14:50.746994 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
2018-06-15 09:14:50.746998 7fb47251b700 10 mds.0.journaler.pq(rw)
append_entry len 81 to 88121975~101
2018-06-15 09:14:50.747007 7fb47251b700 10 mds.0.journaler.pq(rw)
wait_for_readable at 6822392 onreadable 0x557ee0f58300
2018-06-15 09:14:50.747042 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
2018-06-15 09:14:50.747043 7fb47251b700 10 mds.0.journaler.pq(rw)
append_entry len 81 to 88122076~101
2018-06-15 09:14:50.747063 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
2018-06-15 09:14:50.747064 7fb47251b700 10 mds.0.journaler.pq(rw)
append_entry len 81 to 88122177~101
2018-06-15 09:14:50.747113 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
2018-06-15 09:14:50.747114 7fb47251b700 10 mds.0.journaler.pq(rw)
append_entry len 81 to 88122278~101
2018-06-15 09:14:50.747136 7fb47251b700 20 mds.0.journaler.pq(rw)
write_buf_throttle get, delta 101
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com