Have seen some posts and issue trackers related to this topic in the past but haven't been able to put it together to resolve the issue I'm having. All on Luminous 12.2.5 (upgraded over time from past releases). We are going to upgrade to Mimic near future if that would somehow resolve the issue. Summary: 1. We have a CephFS data pool which has steadily and slowly grown in size without corresponding writes to the directory placed on it - a plot of usage over a few hours shows a very regular upward rate of increase. The pool is now 300TB vs 16TB of actual space used in directory. 2. Reading through some email posts and issue trackers led me to disabling 'standby replay' though we are not and have not ever used snapshots. Disabling that feature on our 3 MDS stopped the steady climb. However the pool remains with 300TB of unaccounted for space usage. http://tracker.ceph.com/issues/19593 and http://tracker.ceph.com/issues/21551 3. I've never had any issue starting the MDS or with filesystem functionality but looking through the mds logs I see a single 'journaler.pg(rw) _decode error from assimilate_prefetch' at every startup. A log snippet with context is below with debug_mds and debug_journaler at 20. As noted, there is at least one past email thread on the topic but I'm not quite having the same issue as this person and I couldn't glean any information as to what I should do to repair this error and get stale objects purged from this pool (if that is in fact the issue): http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021379.html Any thoughts on troubleshooting steps I could try next? Here is the log snippet: 2018-06-15 09:14:50.746831 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 2018-06-15 09:14:50.746835 7fb47251b700 10 mds.0.journaler.pq(rw) append_entry len 81 to 88121773~101 2018-06-15 09:14:50.746838 7fb47251b700 10 mds.0.journaler.pq(rw) _prefetch 2018-06-15 09:14:50.746863 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 2018-06-15 09:14:50.746864 7fb47251b700 10 mds.0.journaler.pq(rw) append_entry len 81 to 88121874~101 2018-06-15 09:14:50.746867 7fb47251b700 10 mds.0.journaler.pq(rw) _prefetch 2018-06-15 09:14:50.746901 7fb46fd16700 10 mds.0.journaler.pq(rw) _finish_read got 6822392~1566216 2018-06-15 09:14:50.746909 7fb46fd16700 10 mds.0.journaler.pq(rw) _assimilate_prefetch 6822392~1566216 2018-06-15 09:14:50.746911 7fb46fd16700 10 mds.0.journaler.pq(rw) _assimilate_prefetch gap of 4194304 from received_pos 8388608 to first prefetched buffer 12582912 2018-06-15 09:14:50.746913 7fb46fd16700 10 mds.0.journaler.pq(rw) _assimilate_prefetch read_buf now 6822392~1566216, read pointers 6822392/8388608/50331648 === error here ===> 2018-06-15 09:14:50.746965 7fb46fd16700 -1 mds.0.journaler.pq(rw) _decode error from assimilate_prefetch 2018-06-15 09:14:50.746994 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 2018-06-15 09:14:50.746998 7fb47251b700 10 mds.0.journaler.pq(rw) append_entry len 81 to 88121975~101 2018-06-15 09:14:50.747007 7fb47251b700 10 mds.0.journaler.pq(rw) wait_for_readable at 6822392 onreadable 0x557ee0f58300 2018-06-15 09:14:50.747042 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 2018-06-15 09:14:50.747043 7fb47251b700 10 mds.0.journaler.pq(rw) append_entry len 81 to 88122076~101 2018-06-15 09:14:50.747063 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 2018-06-15 09:14:50.747064 7fb47251b700 10 mds.0.journaler.pq(rw) append_entry len 81 to 88122177~101 2018-06-15 09:14:50.747113 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 2018-06-15 09:14:50.747114 7fb47251b700 10 mds.0.journaler.pq(rw) append_entry len 81 to 88122278~101 2018-06-15 09:14:50.747136 7fb47251b700 20 mds.0.journaler.pq(rw) write_buf_throttle get, delta 101 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com