Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

Dhairya Parmar <dparmar@xxxxxxxxxx> · Tue, 9 Jul 2024 16:30:39 +0530

On Tue, Jul 9, 2024 at 3:46 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> wrote:

> Hi Dhairya,
>
> I would be more than happy to try and give as many details as possible but
> the slack channel is private and requires my email to have an account/
> access to it.
>
You're right in the context that you're required to have an account on
slack; it isn't private at all. The slack channel is open for all, (it's
upstream slack channel :D) it's just that you need to access it with an
email but again it's all your choice, not mandatory. I'd ask @Venky Shankar
<vshankar@xxxxxxxxxx> @Patrick Donnelly <pdonnell@xxxxxxxxxx> to add their
input since they've been working on similar issues and can provide better
insights.

> Wouldn't taking the discussion about this error to a private channel also
> stop other users who experience this error from learning about how and why
> this happened as  well as possibly not be able to view the solution? Would
> it not be possible to discuss this more publicly for the benefit of the
> other users on the mailing list?
>
Kindest regards,
>
> Ivan
> On 09/07/2024 10:44, Dhairya Parmar wrote:
>
> CAUTION: This email originated from outside of the LMB:
> *.-dparmar@xxxxxxxxxx-.*
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> If you think this is a phishing email, please forward it to
> phishing@xxxxxxxxxxxxxxxxx
>
>
> --
> Hey Ivan,
>
> This is a relatively new MDS crash, so this would require some
> investigation but I was instructed to recommend disaster-recovery steps [0]
> (except session reset) to you to get the FS up again.
>
> This crash is being discussed on upstream CephFS slack channel [1] with @Venky
> Shankar <vshankar@xxxxxxxxxx> and other CephFS devs. I'd encourage you to
> join the conversation, we can discuss this in detail and maybe go through
> the incident step by step which should help analyse the crash better.
>
> [0]
> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts
> [1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519
>
> On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx>
> wrote:
>
>> Hi Dhairya,
>>
>> Thank you ever so much for having another look at this so quickly. I
>> don't think I have any logs similar to the ones you referenced this time as
>> my MDSs don't seem to enter the replay stage when they crash (or at least
>> don't now after I've thrown the logs away) but those errors do crop up in
>> the prior logs I shared when the system first crashed.
>>
>> Kindest regards,
>>
>> Ivan
>> On 08/07/2024 14:08, Dhairya Parmar wrote:
>>
>> CAUTION: This email originated from outside of the LMB:
>> *.-dparmar@xxxxxxxxxx-.*
>> Do not click links or open attachments unless you recognize the sender
>> and know the content is safe.
>> If you think this is a phishing email, please forward it to
>> phishing@xxxxxxxxxxxxxxxxx
>>
>>
>> --
>> Ugh, something went horribly wrong. I've downloaded the MDS logs that
>> contain assertion failure and it looks relevant to this [0]. Do you have
>> client logs for this?
>>
>> The other log that you shared is being downloaded right now, once that's
>> done and I'm done going through it, I'll update you.
>>
>> [0] https://tracker.ceph.com/issues/54546
>>
>> On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx>
>> wrote:
>>
>>> Hi Dhairya,
>>>
>>> Sorry to resurrect this thread again, but we still unfortunately have an
>>> issue with our filesystem after we attempted to write new backups to it.
>>>
>>> We finished the scrub of the filesystem on Friday and ran a repair scrub
>>> on the 1 directory which had metadata damage. After doing so and rebooting,
>>> the cluster reported no issues and data was accessible again.
>>>
>>> We re-started the backups to run over the weekend and unfortunately the
>>> filesystem crashed again where the log of the failure is here:
>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz.
>>> We ran the backups on kernel mounts of the filesystem without the nowsync
>>> option this time to avoid the out-of-sync write problems..
>>>
>>> I've tried resetting the journal again after recovering the dentries but
>>> unfortunately the filesystem is still in a failed state despite setting
>>> joinable to true. The log of this crash is here:
>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708
>>> .
>>>
>>> I'm not sure how to proceed as I can't seem to get any MDS to take over
>>> the first rank. I would like to do a scrub of the filesystem and preferably
>>> overwrite the troublesome files with the originals on the live filesystem.
>>> Do you have any advice on how to make the filesystem leave its failed
>>> state? I have a backup of the journal before I reset it so I can roll back
>>> if necessary.
>>>
>>> Here are some details about the filesystem at present:
>>>
>>> root@pebbles-s2 11:49 [~]: ceph -s; ceph fs status
>>>   cluster:
>>>     id:     e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>>>     health: HEALTH_ERR
>>>             1 filesystem is degraded
>>>             1 large omap objects
>>>             1 filesystem is offline
>>>             1 mds daemon damaged
>>>
>>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim flag(s) set
>>>             1750 pgs not deep-scrubbed in time
>>>             1612 pgs not scrubbed in time
>>>
>>>   services:
>>>     mon: 4 daemons, quorum pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4
>>> (age 50m)
>>>     mgr: pebbles-s2(active, since 77m), standbys: pebbles-s1,
>>> pebbles-s3, pebbles-s4
>>>     mds: 1/2 daemons up, 3 standby
>>>     osd: 1380 osds: 1380 up (since 76m), 1379 in (since 10d); 10
>>> remapped pgs
>>>          flags
>>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim
>>>
>>>   data:
>>>     volumes: 1/2 healthy, 1 recovering; 1 damaged
>>>     pools:   7 pools, 2177 pgs
>>>     objects: 3.24G objects, 6.7 PiB
>>>     usage:   8.6 PiB used, 14 PiB / 23 PiB avail
>>>     pgs:     11785954/27384310061 objects misplaced (0.043%)
>>>              2167 active+clean
>>>              6    active+remapped+backfilling
>>>              4    active+remapped+backfill_wait
>>>
>>> ceph_backup - 0 clients
>>> ===========
>>> RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
>>>  0    failed
>>>         POOL            TYPE     USED  AVAIL
>>>    mds_backup_fs      metadata  1174G  3071G
>>> ec82_primary_fs_data    data       0   3071G
>>>       ec82pool          data    8085T  4738T
>>> ceph_archive - 2 clients
>>> ============
>>> RANK  STATE      MDS         ACTIVITY     DNS    INOS   DIRS   CAPS
>>>  0    active  pebbles-s4  Reqs:    0 /s  13.4k  7105    118      2
>>>         POOL            TYPE     USED  AVAIL
>>>    mds_archive_fs     metadata  5184M  3071G
>>> ec83_primary_fs_data    data       0   3071G
>>>       ec83pool          data     138T  4307T
>>> STANDBY MDS
>>>  pebbles-s2
>>>  pebbles-s3
>>>  pebbles-s1
>>> MDS version: ceph version 17.2.7
>>> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
>>> root@pebbles-s2 11:55 [~]: ceph fs dump
>>> e2643889
>>> enable_multiple, ever_enabled_multiple: 1,1
>>> default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
>>> anchor table,9=file layout v2,10=snaprealm v2}
>>> legacy client fscid: 1
>>>
>>> Filesystem 'ceph_backup' (1)
>>> fs_name    ceph_backup
>>> epoch    2643888
>>> flags    12 joinable allow_snaps allow_multimds_snaps
>>> created    2023-05-19T12:52:36.302135+0100
>>> modified    2024-07-08T11:17:55.437861+0100
>>> tableserver    0
>>> root    0
>>> session_timeout    60
>>> session_autoclose    300
>>> max_file_size    109934182400000
>>> required_client_features    {}
>>> last_failure    0
>>> last_failure_osd_epoch    494515
>>> compat    compat={},rocompat={},incompat={1=base v0.20,2=client
>>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
>>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses
>>> inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
>>> max_mds    1
>>> in    0
>>> up    {}
>>> failed
>>> damaged    0
>>> stopped
>>> data_pools    [6,3]
>>> metadata_pool    2
>>> inline_data    disabled
>>> balancer
>>> standby_count_wanted    1
>>>
>>>
>>> Kindest regards,
>>>
>>> Ivan
>>> On 28/06/2024 15:17, Dhairya Parmar wrote:
>>>
>>> CAUTION: This email originated from outside of the LMB:
>>> *.-dparmar@xxxxxxxxxx-.*
>>> Do not click links or open attachments unless you recognize the sender
>>> and know the content is safe.
>>> If you think this is a phishing email, please forward it to
>>> phishing@xxxxxxxxxxxxxxxxx
>>>
>>>
>>> --
>>>
>>>
>>> On Fri, Jun 28, 2024 at 6:02 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx>
>>> wrote:
>>>
>>>> Hi Dhairya,
>>>>
>>>> I would be more than happy to share our corrupted journal. Has the host
>>>> key changed for drop.ceph.com? The fingerprint I'm being sent is
>>>> 7T6dSMcUUa5refV147WEZR99UgW8Y1qYEXZr8ppvog4 which is different to the one
>>>> in our /usr/share/ceph/known_hosts_drop.ceph.com.
>>>>
>>> Ah, strange. Let me get in touch with folks who might know about this,
>>> will revert back to you ASAP
>>>
>>>> Thank you for your advice as well. We've reset our MDS' journal and are
>>>> currently in the process of a full filesystem scrub which understandably is
>>>> taking quite a bit of time but seems to be progressing through the objects
>>>> fine.
>>>>
>>> YAY!
>>>
>>>> Thank you ever so much for all your help and please do feel free to
>>>> follow up with us if you would like any further details about our crash!
>>>>
>>> Glad to hear it went well, this bug is being worked on with high
>>> priority and once the patch is ready, it will be backported.
>>>
>>> The root cause of this issue is the `nowsync` (async dirops) being
>>> enabled by default with kclient [0]. This feature allows asynchronous
>>> creation and deletion of files, optimizing performance by avoiding
>>> round-trip latency for these system calls. However, in very rare cases
>>> (like yours :D), it can affect the system's consistency and stability hence
>>> if this kind of optimization is not a priority for your workload, I
>>> recommend turning it off by switching the mount points to `wsync` and also
>>> set the MDS config `mds_client_delegate_inos_pct` to `0` so that you don't
>>> end up in this situation again (until the bug fix arrives :)).
>>>
>>> [0]
>>> https://github.com/ceph/ceph-client/commit/f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902
>>>
>>>> Kindest regards,
>>>>
>>>> Ivan
>>>> On 27/06/2024 12:39, Dhairya Parmar wrote:
>>>>
>>>> CAUTION: This email originated from outside of the LMB:
>>>> *.-dparmar@xxxxxxxxxx-.*
>>>> Do not click links or open attachments unless you recognize the sender
>>>> and know the content is safe.
>>>> If you think this is a phishing email, please forward it to
>>>> phishing@xxxxxxxxxxxxxxxxx
>>>>
>>>>
>>>> --
>>>> Hi Ivan,
>>>>
>>>> The solution (which has been successful for us in the past) is to reset
>>>> the journal. This would bring the fs back online and return the MDSes to a
>>>> stable state, but some data would be lost—the data in the journal that
>>>> hasn't been flushed to the backing store would be gone. Therefore, you
>>>> should try to flush out as much journal data as possible before resetting
>>>> the journal.
>>>>
>>>> Here are the steps for this entire process:
>>>>
>>>> 1) Bring the FS offline
>>>> $ ceph fs fail <fs_name>
>>>>
>>>> 2) Recover dentries from journal (run it with every MDS Rank)
>>>> $ cephfs-journal-tool --rank=<fs_name>:<mds-rank> event
>>>> recover_dentries summary
>>>>
>>>> 3) Reset the journal (again with every MDS Rank)
>>>> $ cephfs-journal-tool --rank=<fs_name>:<mds-rank> journal reset
>>>>
>>>> 4) Bring the FS online
>>>> $ cephfs fs set <fs_name> joinable true
>>>>
>>>> 5) Restart the MDSes
>>>>
>>>> 6) Perform scrub to ensure consistency of fs
>>>> $ ceph tell mds.<fs_name>:0 scrub start <path> [scrubopts] [tag]
>>>> # you could try a recursive scrub maybe `ceph tell mds.<fs_name>:0
>>>> scrub start / recursive`
>>>>
>>>> Some important notes to keep in mind:
>>>> * Recovering dentries will take time (generally, rank 0 is the most
>>>> time-consuming, but the rest should be quick).
>>>> * cephfs-journal-tool and metadata OSDs are bound to use a significant
>>>> CPU percentage. This is because cephfs-journal-tool has to swig the journal
>>>> data and flush it out to the backing store, which also makes the metadata
>>>> operations go rampant, resulting in OSDs taking a significant percentage of
>>>> CPU.
>>>>
>>>> Do let me know how this goes.
>>>>
>>>> On Thu, Jun 27, 2024 at 3:44 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx>
>>>> wrote:
>>>>
>>>>> Hi Dhairya,
>>>>>
>>>>> We can induce the crash by simply restarting the MDS and the crash
>>>>> seems to happen when an MDS goes from up:standby to up:replay. The MDS
>>>>> works through a few files in the log before eventually crashing where I've
>>>>> included the logs for this here (this is after I imported the backed up
>>>>> journal which I hope was successful but please let me know if you suspect
>>>>> it wasn't!):
>>>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.mds_restart_crash.log
>>>>>
>>>>> With respect to the client logs, are you referring to the clients who
>>>>> are writing to the filesystem? We don't typically run them in any sort of
>>>>> debug mode and we have quite a few machines running our backup system but
>>>>> we can look an hour or so before the first MDS crash (though I don't know
>>>>> if this is when the de-sync occurred). Here are some MDS logs with regards
>>>>> to the initial crash on Saturday morning though which may be helpful:
>>>>>
>>>>>    -59> 2024-06-22T05:41:43.090+0100 7f184ce82700 10 monclient: tick
>>>>>    -58> 2024-06-22T05:41:43.090+0100 7f184ce82700 10 monclient:
>>>>> _check_auth_rotating have uptodate secrets (they expire after
>>>>> 2024-06-22T05:41:13.091556+0100)
>>>>>    -57> 2024-06-22T05:41:43.208+0100 7f184de84700  1 mds.pebbles-s2
>>>>> Updating MDS map to version 2529650 from mon.3
>>>>>    -56> 2024-06-22T05:41:43.208+0100 7f184de84700  4 mds.0.purge_queue
>>>>> operator():  data pool 6 not found in OSDMap
>>>>>    -55> 2024-06-22T05:41:43.208+0100 7f184de84700  4 mds.0.purge_queue
>>>>> operator():  data pool 3 not found in OSDMap
>>>>>    -54> 2024-06-22T05:41:43.209+0100 7f184de84700  5
>>>>> asok(0x5592e7968000) register_command objecter_requests hook 0x5592e78f8800
>>>>>    -53> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient:
>>>>> _renew_subs
>>>>>    -52> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient:
>>>>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0
>>>>>    -51> 2024-06-22T05:41:43.209+0100 7f184de84700 10
>>>>> log_channel(cluster) update_config to_monitors: true to_syslog: false
>>>>> syslog_facility:  prio: info to_graylog: false graylog_host: 127.0.0.1
>>>>> graylog_port: 12201)
>>>>>    -50> 2024-06-22T05:41:43.209+0100 7f184de84700  4 mds.0.purge_queue
>>>>> operator():  data pool 6 not found in OSDMap
>>>>>    -49> 2024-06-22T05:41:43.209+0100 7f184de84700  4 mds.0.purge_queue
>>>>> operator():  data pool 3 not found in OSDMap
>>>>>    -48> 2024-06-22T05:41:43.209+0100 7f184de84700  4 mds.0.0
>>>>> apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 0)
>>>>>    -47> 2024-06-22T05:41:43.209+0100 7f184de84700  1 mds.0.2529650
>>>>> handle_mds_map i am now mds.0.2529650
>>>>>    -46> 2024-06-22T05:41:43.209+0100 7f184de84700  1 mds.0.2529650
>>>>> handle_mds_map state change up:standby --> up:replay
>>>>>    -45> 2024-06-22T05:41:43.209+0100 7f184de84700  5
>>>>> mds.beacon.pebbles-s2 set_want_state: up:standby -> up:replay
>>>>>    -44> 2024-06-22T05:41:43.209+0100 7f184de84700  1 mds.0.2529650
>>>>> replay_start
>>>>>    -43> 2024-06-22T05:41:43.209+0100 7f184de84700  1 mds.0.2529650
>>>>> waiting for osdmap 473739 (which blocklists prior instance)
>>>>>    -42> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient:
>>>>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0
>>>>>    -41> 2024-06-22T05:41:43.209+0100 7f1849e7c700  2 mds.0.cache
>>>>> Memory usage:  total 299012, rss 37624, heap 182556, baseline 182556, 0 / 0
>>>>> inodes have caps, 0 caps, 0 caps per inode
>>>>>    -40> 2024-06-22T05:41:43.224+0100 7f184de84700 10 monclient:
>>>>> _renew_subs
>>>>>    -39> 2024-06-22T05:41:43.224+0100 7f184de84700 10 monclient:
>>>>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0
>>>>>    -38> 2024-06-22T05:41:43.224+0100 7f184de84700 10 monclient:
>>>>> handle_get_version_reply finishing 1 version 473739
>>>>>    -37> 2024-06-22T05:41:43.224+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 0: opening inotable
>>>>>    -36> 2024-06-22T05:41:43.224+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 0: opening sessionmap
>>>>>    -35> 2024-06-22T05:41:43.224+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 0: opening mds log
>>>>>    -34> 2024-06-22T05:41:43.224+0100 7f1847e78700  5 mds.0.log open
>>>>> discovering log bounds
>>>>>    -33> 2024-06-22T05:41:43.224+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 0: opening purge queue (async)
>>>>>    -32> 2024-06-22T05:41:43.224+0100 7f1847e78700  4 mds.0.purge_queue
>>>>> open: opening
>>>>>    -31> 2024-06-22T05:41:43.224+0100 7f1847e78700  1
>>>>> mds.0.journaler.pq(ro) recover start
>>>>>    -30> 2024-06-22T05:41:43.224+0100 7f1847e78700  1
>>>>> mds.0.journaler.pq(ro) read_head
>>>>>    -29> 2024-06-22T05:41:43.224+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 0: loading open file table (async)
>>>>>    -28> 2024-06-22T05:41:43.224+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 0: opening snap table
>>>>>    -27> 2024-06-22T05:41:43.224+0100 7f1847677700  4
>>>>> mds.0.journalpointer Reading journal pointer '400.00000000'
>>>>>    -26> 2024-06-22T05:41:43.224+0100 7f1850689700 10 monclient:
>>>>> get_auth_request con 0x5592e8987000 auth_method 0
>>>>>    -25> 2024-06-22T05:41:43.225+0100 7f1850e8a700 10 monclient:
>>>>> get_auth_request con 0x5592e8987c00 auth_method 0
>>>>>    -24> 2024-06-22T05:41:43.252+0100 7f1848e7a700  1
>>>>> mds.0.journaler.pq(ro) _finish_read_head loghead(trim 231160676352, expire
>>>>> 231163662875, write 231163662875, stream_format 1).  probing for end of log
>>>>> (from 231163662875)...
>>>>>    -23> 2024-06-22T05:41:43.252+0100 7f1848e7a700  1
>>>>> mds.0.journaler.pq(ro) probing for end of the log
>>>>>    -22> 2024-06-22T05:41:43.252+0100 7f1847677700  1
>>>>> mds.0.journaler.mdlog(ro) recover start
>>>>>    -21> 2024-06-22T05:41:43.252+0100 7f1847677700  1
>>>>> mds.0.journaler.mdlog(ro) read_head
>>>>>    -20> 2024-06-22T05:41:43.252+0100 7f1847677700  4 mds.0.log Waiting
>>>>> for journal 0x200 to recover...
>>>>>    -19> 2024-06-22T05:41:43.252+0100 7f1850689700 10 monclient:
>>>>> get_auth_request con 0x5592e8bc6000 auth_method 0
>>>>>    -18> 2024-06-22T05:41:43.253+0100 7f185168b700 10 monclient:
>>>>> get_auth_request con 0x5592e8bc6800 auth_method 0
>>>>>    -17> 2024-06-22T05:41:43.257+0100 7f1847e78700  1
>>>>> mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 90131453181952,
>>>>> expire 90131465778558, write 90132009715463, stream_format 1).  probing for
>>>>> end of log (from 90132009715463)...
>>>>>    -16> 2024-06-22T05:41:43.257+0100 7f1847e78700  1
>>>>> mds.0.journaler.mdlog(ro) probing for end of the log
>>>>>    -15> 2024-06-22T05:41:43.257+0100 7f1847e78700  1
>>>>> mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 90132019384791
>>>>> (header had 90132009715463). recovered.
>>>>>    -14> 2024-06-22T05:41:43.257+0100 7f1847677700  4 mds.0.log Journal
>>>>> 0x200 recovered.
>>>>>    -13> 2024-06-22T05:41:43.257+0100 7f1847677700  4 mds.0.log
>>>>> Recovered journal 0x200 in format 1
>>>>>    -12> 2024-06-22T05:41:43.273+0100 7f1848e7a700  1
>>>>> mds.0.journaler.pq(ro) _finish_probe_end write_pos = 231163662875 (header
>>>>> had 231163662875). recovered.
>>>>>    -11> 2024-06-22T05:41:43.273+0100 7f1848e7a700  4 mds.0.purge_queue
>>>>> operator(): open complete
>>>>>    -10> 2024-06-22T05:41:43.273+0100 7f1848e7a700  1
>>>>> mds.0.journaler.pq(ro) set_writeable
>>>>>     -9> 2024-06-22T05:41:43.441+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 1: loading/discovering base inodes
>>>>>     -8> 2024-06-22T05:41:43.441+0100 7f1847e78700  0 mds.0.cache
>>>>> creating system inode with ino:0x100
>>>>>     -7> 2024-06-22T05:41:43.442+0100 7f1847e78700  0 mds.0.cache
>>>>> creating system inode with ino:0x1
>>>>>     -6> 2024-06-22T05:41:43.442+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 2: replaying mds log
>>>>>     -5> 2024-06-22T05:41:43.442+0100 7f1847e78700  2 mds.0.2529650
>>>>> Booting: 2: waiting for purge queue recovered
>>>>>     -4> 2024-06-22T05:41:44.090+0100 7f184ce82700 10 monclient: tick
>>>>>     -3> 2024-06-22T05:41:44.090+0100 7f184ce82700 10 monclient:
>>>>> _check_auth_rotating have uptodate secrets (they expire after
>>>>> 2024-06-22T05:41:14.091638+0100)
>>>>>     -2> 2024-06-22T05:41:44.210+0100 7f1849e7c700  2 mds.0.cache
>>>>> Memory usage:  total 588368, rss 308304, heap 207132, baseline 182556, 0 /
>>>>> 15149 inodes have caps, 0 caps, 0 caps per inode
>>>>>     -1> 2024-06-22T05:41:44.642+0100 7f1846675700 -1
>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h:
>>>>> In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T,
>>>>> T)>) [with T = inodeno_t; C = std::map]' thread 7f1846675700 time
>>>>> 2024-06-22T05:41:44.643146+0100
>>>>>
>>>>>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>>>>> (stable)
>>>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x135) [0x7f18568b64a3]
>>>>>  2: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f18568b6669]
>>>>>  3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
>>>>> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x5592e5027885]
>>>>>  4: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
>>>>> MDPeerUpdate*)+0x4377) [0x5592e532c7b7]
>>>>>  5: (EUpdate::replay(MDSRank*)+0x61) [0x5592e5330bd1]
>>>>>  6: (MDLog::_replay_thread()+0x7bb) [0x5592e52b754b]
>>>>>  7: (MDLog::ReplayThread::entry()+0x11) [0x5592e4f6a041]
>>>>>  8: /lib64/libpthread.so.0(+0x81ca) [0x7f18558a41ca]
>>>>>  9: clone()
>>>>>
>>>>>      0> 2024-06-22T05:41:44.643+0100 7f1846675700 -1 *** Caught signal
>>>>> (Aborted) **
>>>>>  in thread 7f1846675700 thread_name:md_log_replay
>>>>>
>>>>>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>>>>> (stable)
>>>>>  1: /lib64/libpthread.so.0(+0x12cf0) [0x7f18558aecf0]
>>>>>  2: gsignal()
>>>>>  3: abort()
>>>>>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x18f) [0x7f18568b64fd]
>>>>>  5: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f18568b6669]
>>>>>  6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t,
>>>>> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x5592e5027885]
>>>>>  7: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
>>>>> MDPeerUpdate*)+0x4377) [0x5592e532c7b7]
>>>>>  8: (EUpdate::replay(MDSRank*)+0x61) [0x5592e5330bd1]
>>>>>  9: (MDLog::_replay_thread()+0x7bb) [0x5592e52b754b]
>>>>>  10: (MDLog::ReplayThread::entry()+0x11) [0x5592e4f6a041]
>>>>>  11: /lib64/libpthread.so.0(+0x81ca) [0x7f18558a41ca]
>>>>>  12: clone()
>>>>>
>>>>> We have a relatively low debug setting normally so I don't think many
>>>>> details of the initial crash were captured unfortunately and the MDS logs
>>>>> before the above (i.e. "-60" and older) are just beacon messages and
>>>>> _check_auth_rotating checks.
>>>>>
>>>>> I was wondering whether you have any recommendations in terms of what
>>>>> actions we could take to bring our filesystem back into a working state
>>>>> short of rebuilding the entire metadata pool? We are quite keen to bring
>>>>> our backup back into service urgently as we currently do not have any
>>>>> accessible backups for our Ceph clusters.
>>>>>
>>>>> Kindest regards,
>>>>>
>>>>> Ivan
>>>>> On 25/06/2024 19:18, Dhairya Parmar wrote:
>>>>>
>>>>> CAUTION: This email originated from outside of the LMB:
>>>>> *.-dparmar@xxxxxxxxxx-.*
>>>>> Do not click links or open attachments unless you recognize the sender
>>>>> and know the content is safe.
>>>>> If you think this is a phishing email, please forward it to
>>>>> phishing@xxxxxxxxxxxxxxxxx
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> On Tue, Jun 25, 2024 at 6:38 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx>
>>>>> wrote:
>>>>>
>>>>>> Hi Dhairya,
>>>>>>
>>>>>> Thank you for your rapid reply. I tried recovering the dentries for
>>>>>> the file just before the crash I mentioned before and then splicing the
>>>>>> transactions from the journal which seemed to remove that issue for that
>>>>>> inode but resulted in the MDS crashing on the next inode in the
>>>>>> journal when performing replay.
>>>>>>
>>>>> The MDS delegates a range of preallocated inodes (in form of a set -
>>>>> interval_set<inodeno_t> preallocated_inos) to the clients, so it can be one
>>>>> inode that is untracked or some inodes from the range or in worst case
>>>>> scenario - ALL, and this is something that even the `cephfs-journal-tool`
>>>>> would not be able to tell (since we're talking about MDS internals which
>>>>> aren't exposed to such tools). That is the reason why you see "MDS crashing
>>>>> on the next inode in the journal when performing replay".
>>>>>
>>>>> An option could be to expose the inode set to some tool or asok cmd to
>>>>> identify such inodes ranges, which needs to be discussed. For now, we're
>>>>> trying to address this in [0], you can follow the discussion there.
>>>>>
>>>>> [0] https://tracker.ceph.com/issues/66251
>>>>>
>>>>>> Removing all the transactions involving the directory housing the
>>>>>> files that seemed to cause these crashes from the journal only caused the
>>>>>> MDS to fail to even start replay.
>>>>>>
>>>>> I've rolled back our journal to our original version when the crash
>>>>>> first happened and the entire MDS log for the crash can be found here:
>>>>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.flush_journal.log-25-06-24
>>>>>>
>>>>> Awesome, this would help us a ton. Apart from this, would it be
>>>>> possible to send us client logs?
>>>>>
>>>>>> Please let us know if you would like any other logs file as we can
>>>>>> easily induce this crash.
>>>>>>
>>>>> Since you can easily induce the crash, can you share the reproducer
>>>>> please i.e. what all action you take in order to hit this?
>>>>>
>>>>>> Kindest regards,
>>>>>>
>>>>>> Ivan
>>>>>> On 25/06/2024 09:58, Dhairya Parmar wrote:
>>>>>>
>>>>>> CAUTION: This email originated from outside of the LMB:
>>>>>> *.-dparmar@xxxxxxxxxx-.*
>>>>>> Do not click links or open attachments unless you recognize the
>>>>>> sender and know the content is safe.
>>>>>> If you think this is a phishing email, please forward it to
>>>>>> phishing@xxxxxxxxxxxxxxxxx
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hi Ivan,
>>>>>>
>>>>>> This looks to be similar to the issue [0] that we're already
>>>>>> addressing at [1]. So basically there is some out-of-sync event that led
>>>>>> the client to make use of the inodes that MDS wasn't aware of/isn't
>>>>>> tracking and hence the crash. It'd be really helpful if you can provide us
>>>>>> more logs.
>>>>>>
>>>>>> CC @Rishabh Dave <ridave@xxxxxxxxxx> @Venky Shankar
>>>>>> <vshankar@xxxxxxxxxx> @Patrick Donnelly <pdonnell@xxxxxxxxxx> @Xiubo
>>>>>> Li <xiubli@xxxxxxxxxx>
>>>>>>
>>>>>> [0] https://tracker.ceph.com/issues/61009
>>>>>> [1] https://tracker.ceph.com/issues/66251
>>>>>> --
>>>>>> *Dhairya Parmar*
>>>>>>
>>>>>> Associate Software Engineer, CephFS
>>>>>>
>>>>>> <https://www.redhat.com/>IBM, Inc.
>>>>>>
>>>>>> On Mon, Jun 24, 2024 at 8:54 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have been experiencing a serious issue with our CephFS backup
>>>>>>> cluster
>>>>>>> running quincy (version 17.2.7) on a RHEL8-derivative Linux kernel
>>>>>>> (Alma8.9, 4.18.0-513.9.1 kernel) where our MDSes for our filesystem
>>>>>>> are
>>>>>>> constantly in a "replay" or "replay(laggy)" state and keep crashing.
>>>>>>>
>>>>>>> We have a single MDS filesystem called "ceph_backup" with 2 standby
>>>>>>> MDSes along with a 2nd unused filesystem "ceph_archive" (this holds
>>>>>>> little to no data) where we are using our "ceph_backup" filesystem
>>>>>>> to
>>>>>>> backup our data and this is the one which is currently broken. The
>>>>>>> Ceph
>>>>>>> health outputs currently are:
>>>>>>>
>>>>>>>     root@pebbles-s1 14:05 [~]: ceph -s
>>>>>>>        cluster:
>>>>>>>          id:     e3f7535e-d35f-4a5d-88f0-a1e97abcd631
>>>>>>>          health: HEALTH_WARN
>>>>>>>                  1 filesystem is degraded
>>>>>>>                  insufficient standby MDS daemons available
>>>>>>>                  1319 pgs not deep-scrubbed in time
>>>>>>>                  1054 pgs not scrubbed in time
>>>>>>>
>>>>>>>        services:
>>>>>>>          mon: 4 daemons, quorum
>>>>>>>     pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4 (age 36m)
>>>>>>>          mgr: pebbles-s2(active, since 36m), standbys: pebbles-s4,
>>>>>>>     pebbles-s3, pebbles-s1
>>>>>>>          mds: 2/2 daemons up
>>>>>>>          osd: 1380 osds: 1380 up (since 29m), 1379 in (since 3d); 37
>>>>>>>     remapped pgs
>>>>>>>
>>>>>>>        data:
>>>>>>>          volumes: 1/2 healthy, 1 recovering
>>>>>>>          pools:   7 pools, 2177 pgs
>>>>>>>          objects: 3.55G objects, 7.0 PiB
>>>>>>>          usage:   8.9 PiB used, 14 PiB / 23 PiB avail
>>>>>>>          pgs:     83133528/30006841533 objects misplaced (0.277%)
>>>>>>>                   2090 active+clean
>>>>>>>                   47   active+clean+scrubbing+deep
>>>>>>>                   29   active+remapped+backfilling
>>>>>>>                   8    active+remapped+backfill_wait
>>>>>>>                   2    active+clean+scrubbing
>>>>>>>                   1    active+clean+snaptrim
>>>>>>>
>>>>>>>        io:
>>>>>>>          recovery: 1.9 GiB/s, 719 objects/s
>>>>>>>
>>>>>>>     root@pebbles-s1 14:09 [~]: ceph fs status
>>>>>>>     ceph_backup - 0 clients
>>>>>>>     ===========
>>>>>>>     RANK      STATE         MDS      ACTIVITY   DNS    INOS   DIRS
>>>>>>> CAPS
>>>>>>>       0    replay(laggy)  pebbles-s3               0      0 0      0
>>>>>>>              POOL            TYPE     USED  AVAIL
>>>>>>>         mds_backup_fs      metadata  1255G  2780G
>>>>>>>     ec82_primary_fs_data    data       0   2780G
>>>>>>>            ec82pool          data    8442T  3044T
>>>>>>>     ceph_archive - 2 clients
>>>>>>>     ============
>>>>>>>     RANK  STATE      MDS         ACTIVITY     DNS    INOS   DIRS CAPS
>>>>>>>       0    active  pebbles-s2  Reqs:    0 /s  13.4k  7105    118 2
>>>>>>>              POOL            TYPE     USED  AVAIL
>>>>>>>         mds_archive_fs     metadata  5184M  2780G
>>>>>>>     ec83_primary_fs_data    data       0   2780G
>>>>>>>            ec83pool          data     138T  2767T
>>>>>>>     MDS version: ceph version 17.2.7
>>>>>>>     (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
>>>>>>>     root@pebbles-s1 14:09 [~]: ceph health detail | head
>>>>>>>     HEALTH_WARN 1 filesystem is degraded; insufficient standby MDS
>>>>>>>     daemons available; 1319 pgs not deep-scrubbed in time; 1054 pgs
>>>>>>> not
>>>>>>>     scrubbed in time
>>>>>>>     [WRN] FS_DEGRADED: 1 filesystem is degraded
>>>>>>>          fs ceph_backup is degraded
>>>>>>>     [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons
>>>>>>>     available
>>>>>>>          have 0; want 1 more
>>>>>>>
>>>>>>> When our cluster first ran after a reboot, Ceph ran through the 2
>>>>>>> standby MDSes, crashing them all, until it reached the final MDS and
>>>>>>> is
>>>>>>> now stuck in this "replay(laggy)" state. Putting our MDSes into
>>>>>>> debugging mode, we can see that this MDS crashed when replaying the
>>>>>>> journal for a particular inode (this is the same for all the MDSes
>>>>>>> and
>>>>>>> they all crash on the same object):
>>>>>>>
>>>>>>>     ...
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay for [521,head] had [inode 0x1005ba89481
>>>>>>>     [...539,head]
>>>>>>>
>>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/
>>>>>>>     auth fragtree_t(*^2 00*^3 00000*^
>>>>>>>     4 00001*^3 00010*^4 00011*^4 00100*^4 00101*^4 00110*^4 00111*^4
>>>>>>>     01*^3 01000*^4 01001*^3 01010*^4 01011*^3 01100*^4 01101*^4
>>>>>>> 01110*^4
>>>>>>>     01111*^4 10*^3 10000*^4 10001*^4 10010*^4 10011*^4 10100*^4
>>>>>>> 10101*^3
>>>>>>>     10110*^4 10111*^4 11*^6) v10880645 f(v0 m2024-06-22
>>>>>>>     T05:41:10.213700+0100 1281276=1281276+0) n(v12
>>>>>>>     rc2024-06-22T05:41:10.213700+0100 b1348251683896
>>>>>>> 1281277=1281276+1)
>>>>>>>     old_inodes=8 (iversion lock) | dirfrag=416 dirty=1
>>>>>>> 0x55770a2bdb80]
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay dir 0x1005ba89481.011011000*
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay updated dir [dir 0x1005ba89481.011011000*
>>>>>>>
>>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/
>>>>>>>     [2,head] auth v=436385 cv=0/0 state=107374182
>>>>>>>     4 f(v0 m2024-06-22T05:41:10.213700+0100 2502=2502+0) n(v12
>>>>>>>     rc2024-06-22T05:41:10.213700+0100 b2120744220 2502=2502+0)
>>>>>>>     hs=32+33,ss=0+0 dirty=65 | child=1 0x55770ebcda80]
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay added (full) [dentry
>>>>>>>
>>>>>>> #0x1/cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fracti
>>>>>>>     ons_ave_Z124.mrc.teberet7.partial [539,head] auth NULL (dversion
>>>>>>>     lock) v=436384 ino=(nil) state=1610612800|bottomlru | dirty=1
>>>>>>>     0x557710444500]
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay added [inode 0x1005cd4fe35 [539,head]
>>>>>>>
>>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_
>>>>>>>     005006_fractions_ave_Z124.mrc.teberet7.partial auth v436384 s=0
>>>>>>> n(v0
>>>>>>>     1=1+0) (iversion lock) cr={99995144=0-4194304@538}
>>>>>>> 0x557710438680]
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10
>>>>>>>     mds.0.cache.ino(0x1005cd4fe35) mark_dirty_parent
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay noting opened inode [inode 0x1005cd4fe35
>>>>>>> [539,head]
>>>>>>>
>>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/FoilHole_27649821_Data_27626128_2762
>>>>>>>     6130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial auth
>>>>>>>     v436384 DIRTYPARENT s=0 n(v0 1=1+0) (iversion lock)
>>>>>>>     cr={99995144=0-4194304@538} | dirtyparent=1 dirty=1
>>>>>>> 0x557710438680]
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay inotable tablev 3112837 <= table 3112837
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal
>>>>>>>     EMetaBlob.replay sessionmap v 1560540883, table 1560540882
>>>>>>> prealloc
>>>>>>>     [] used 0x1005cd4fe35
>>>>>>>     2024-06-24T13:44:55.563+0100 7f8811c40700 -1
>>>>>>>
>>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h:
>>>>>>>     I
>>>>>>>     n function 'void interval_set<T, C>::erase(T, T,
>>>>>>>     std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]'
>>>>>>>     thread 7f8811c40700 time 2024-06-24T13:44:55.564315+0100
>>>>>>>
>>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h:
>>>>>>>     568: FAILED ceph_assert(p->first <= start)
>>>>>>>
>>>>>>>       ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
>>>>>>>     quincy (stable)
>>>>>>>       1: (ceph::__ceph_assert_fail(char const*, char const*, int,
>>>>>>> char
>>>>>>>     const*)+0x135) [0x7f8821e814a3]
>>>>>>>       2: /usr/lib64/ceph/libceph-common.so.2(+0x269669)
>>>>>>> [0x7f8821e81669]
>>>>>>>       3: (interval_set<inodeno_t, std::map>::erase(inodeno_t,
>>>>>>> inodeno_t,
>>>>>>>     std::function<bool (inodeno_t, inodeno_t)>)+0x2e5)
>>>>>>> [0x5576f9bb2885]
>>>>>>>       4: (EMetaBlob::replay(MDSRank*, LogSegment*, int,
>>>>>>>     MDPeerUpdate*)+0x4377) [0x5576f9eb77b7]
>>>>>>>       5: (EUpdate::replay(MDSRank*)+0x61) [0x5576f9ebbbd1]
>>>>>>>       6: (MDLog::_replay_thread()+0x7bb) [0x5576f9e4254b]
>>>>>>>       7: (MDLog::ReplayThread::entry()+0x11) [0x5576f9af5041]
>>>>>>>       8: /lib64/libpthread.so.0(+0x81ca) [0x7f8820e6f1ca]
>>>>>>>       9: clone()
>>>>>>>
>>>>>>> I've only included a short section of the crash (this is the first
>>>>>>> trace
>>>>>>> in the log with regards to the crash with a 10/20 debug_mds option).
>>>>>>> We
>>>>>>> tried deleting the 0x1005cd4fe35 object from the object store using
>>>>>>> the
>>>>>>> "rados" command but this did not allow our MDS to successfully
>>>>>>> replay.
>>>>>>>
>>>>>>>  From my understanding the journal seems okay as we didn't run out
>>>>>>> of
>>>>>>> space for example on our metadata pool and "cephfs-journal-tool
>>>>>>> journal
>>>>>>> inspect" doesn't seem to think there is any damage:
>>>>>>>
>>>>>>>     root@pebbles-s1 13:58 [~]: cephfs-journal-tool
>>>>>>> --rank=ceph_backup:0
>>>>>>>     journal inspect
>>>>>>>     Overall journal integrity: OK
>>>>>>>     root@pebbles-s1 14:04 [~]: cephfs-journal-tool
>>>>>>> --rank=ceph_backup:0
>>>>>>>     event get --inode 1101069090357 summary
>>>>>>>     Events by type:
>>>>>>>        OPEN: 1
>>>>>>>        UPDATE: 3
>>>>>>>     Errors: 0
>>>>>>>     root@pebbles-s1 14:05 [~]: cephfs-journal-tool
>>>>>>> --rank=ceph_backup:0
>>>>>>>     event get --inode 1101069090357 list
>>>>>>>     2024-06-22T05:41:10.214635+0100 0x51f97d4cfe35 UPDATE:  (openc)
>>>>>>>
>>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial
>>>>>>>     2024-06-22T05:41:11.203312+0100 0x51f97d59c848 UPDATE:
>>>>>>>     (check_inode_max_size)
>>>>>>>
>>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial
>>>>>>>
>>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial
>>>>>>>     2024-06-22T05:41:15.484871+0100 0x51f97e7344cc OPEN:  ()
>>>>>>>
>>>>>>> FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial
>>>>>>>     2024-06-22T05:41:15.484921+0100 0x51f97e73493b UPDATE:  (rename)
>>>>>>>
>>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial
>>>>>>>
>>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc
>>>>>>>
>>>>>>> I was wondering whether anyone had any advice for us on how we
>>>>>>> should
>>>>>>> proceed forward? We were thinking about manually applying these
>>>>>>> events
>>>>>>> (via "event apply") where failing that we could erase this
>>>>>>> problematic
>>>>>>> event with "cephfs-journal-tool --rank=ceph_backup:0 event splice
>>>>>>> --inode 1101069090357". Is this a good idea? We would rather not
>>>>>>> rebuild
>>>>>>> the entire metadata pool if we could avoid it (once was enough for
>>>>>>> us)
>>>>>>> as this cluster has ~9 PB of data on it.
>>>>>>>
>>>>>>> Kindest regards,
>>>>>>>
>>>>>>> Ivan Clayson
>>>>>>>
>>>>>>> --
>>>>>>> Ivan Clayson
>>>>>>> -----------------
>>>>>>> Scientific Computing Officer
>>>>>>> Room 2N249
>>>>>>> Structural Studies
>>>>>>> MRC Laboratory of Molecular Biology
>>>>>>> Francis Crick Ave, Cambridge
>>>>>>> CB2 0QH
>>>>>>> _______________________________________________
>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>>
>>>>>> --
>>>>>> Ivan Clayson
>>>>>> -----------------
>>>>>> Scientific Computing Officer
>>>>>> Room 2N249
>>>>>> Structural Studies
>>>>>> MRC Laboratory of Molecular Biology
>>>>>> Francis Crick Ave, Cambridge
>>>>>> CB2 0QH
>>>>>>
>>>>>> --
>>>>> Ivan Clayson
>>>>> -----------------
>>>>> Scientific Computing Officer
>>>>> Room 2N249
>>>>> Structural Studies
>>>>> MRC Laboratory of Molecular Biology
>>>>> Francis Crick Ave, Cambridge
>>>>> CB2 0QH
>>>>>
>>>>> --
>>>> Ivan Clayson
>>>> -----------------
>>>> Scientific Computing Officer
>>>> Room 2N249
>>>> Structural Studies
>>>> MRC Laboratory of Molecular Biology
>>>> Francis Crick Ave, Cambridge
>>>> CB2 0QH
>>>>
>>>> --
>>> Ivan Clayson
>>> -----------------
>>> Scientific Computing Officer
>>> Room 2N249
>>> Structural Studies
>>> MRC Laboratory of Molecular Biology
>>> Francis Crick Ave, Cambridge
>>> CB2 0QH
>>>
>>> --
>> Ivan Clayson
>> -----------------
>> Scientific Computing Officer
>> Room 2N249
>> Structural Studies
>> MRC Laboratory of Molecular Biology
>> Francis Crick Ave, Cambridge
>> CB2 0QH
>>
>> --
> Ivan Clayson
> -----------------
> Scientific Computing Officer
> Room 2N249
> Structural Studies
> MRC Laboratory of Molecular Biology
> Francis Crick Ave, Cambridge
> CB2 0QH
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx