On Tue, Jul 9, 2024 at 3:46 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> wrote: > Hi Dhairya, > > I would be more than happy to try and give as many details as possible but > the slack channel is private and requires my email to have an account/ > access to it. > You're right in the context that you're required to have an account on slack; it isn't private at all. The slack channel is open for all, (it's upstream slack channel :D) it's just that you need to access it with an email but again it's all your choice, not mandatory. I'd ask @Venky Shankar <vshankar@xxxxxxxxxx> @Patrick Donnelly <pdonnell@xxxxxxxxxx> to add their input since they've been working on similar issues and can provide better insights. > Wouldn't taking the discussion about this error to a private channel also > stop other users who experience this error from learning about how and why > this happened as well as possibly not be able to view the solution? Would > it not be possible to discuss this more publicly for the benefit of the > other users on the mailing list? > Kindest regards, > > Ivan > On 09/07/2024 10:44, Dhairya Parmar wrote: > > CAUTION: This email originated from outside of the LMB: > *.-dparmar@xxxxxxxxxx-.* > Do not click links or open attachments unless you recognize the sender and > know the content is safe. > If you think this is a phishing email, please forward it to > phishing@xxxxxxxxxxxxxxxxx > > > -- > Hey Ivan, > > This is a relatively new MDS crash, so this would require some > investigation but I was instructed to recommend disaster-recovery steps [0] > (except session reset) to you to get the FS up again. > > This crash is being discussed on upstream CephFS slack channel [1] with @Venky > Shankar <vshankar@xxxxxxxxxx> and other CephFS devs. I'd encourage you to > join the conversation, we can discuss this in detail and maybe go through > the incident step by step which should help analyse the crash better. > > [0] > https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/#disaster-recovery-experts > [1] https://ceph-storage.slack.com/archives/C04LVQMHM9B/p1720443057919519 > > On Mon, Jul 8, 2024 at 7:37 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> > wrote: > >> Hi Dhairya, >> >> Thank you ever so much for having another look at this so quickly. I >> don't think I have any logs similar to the ones you referenced this time as >> my MDSs don't seem to enter the replay stage when they crash (or at least >> don't now after I've thrown the logs away) but those errors do crop up in >> the prior logs I shared when the system first crashed. >> >> Kindest regards, >> >> Ivan >> On 08/07/2024 14:08, Dhairya Parmar wrote: >> >> CAUTION: This email originated from outside of the LMB: >> *.-dparmar@xxxxxxxxxx-.* >> Do not click links or open attachments unless you recognize the sender >> and know the content is safe. >> If you think this is a phishing email, please forward it to >> phishing@xxxxxxxxxxxxxxxxx >> >> >> -- >> Ugh, something went horribly wrong. I've downloaded the MDS logs that >> contain assertion failure and it looks relevant to this [0]. Do you have >> client logs for this? >> >> The other log that you shared is being downloaded right now, once that's >> done and I'm done going through it, I'll update you. >> >> [0] https://tracker.ceph.com/issues/54546 >> >> On Mon, Jul 8, 2024 at 4:49 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> >> wrote: >> >>> Hi Dhairya, >>> >>> Sorry to resurrect this thread again, but we still unfortunately have an >>> issue with our filesystem after we attempted to write new backups to it. >>> >>> We finished the scrub of the filesystem on Friday and ran a repair scrub >>> on the 1 directory which had metadata damage. After doing so and rebooting, >>> the cluster reported no issues and data was accessible again. >>> >>> We re-started the backups to run over the weekend and unfortunately the >>> filesystem crashed again where the log of the failure is here: >>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s2.log-20240708.gz. >>> We ran the backups on kernel mounts of the filesystem without the nowsync >>> option this time to avoid the out-of-sync write problems.. >>> >>> I've tried resetting the journal again after recovering the dentries but >>> unfortunately the filesystem is still in a failed state despite setting >>> joinable to true. The log of this crash is here: >>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s4.log-20240708 >>> . >>> >>> I'm not sure how to proceed as I can't seem to get any MDS to take over >>> the first rank. I would like to do a scrub of the filesystem and preferably >>> overwrite the troublesome files with the originals on the live filesystem. >>> Do you have any advice on how to make the filesystem leave its failed >>> state? I have a backup of the journal before I reset it so I can roll back >>> if necessary. >>> >>> Here are some details about the filesystem at present: >>> >>> root@pebbles-s2 11:49 [~]: ceph -s; ceph fs status >>> cluster: >>> id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631 >>> health: HEALTH_ERR >>> 1 filesystem is degraded >>> 1 large omap objects >>> 1 filesystem is offline >>> 1 mds daemon damaged >>> >>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim flag(s) set >>> 1750 pgs not deep-scrubbed in time >>> 1612 pgs not scrubbed in time >>> >>> services: >>> mon: 4 daemons, quorum pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4 >>> (age 50m) >>> mgr: pebbles-s2(active, since 77m), standbys: pebbles-s1, >>> pebbles-s3, pebbles-s4 >>> mds: 1/2 daemons up, 3 standby >>> osd: 1380 osds: 1380 up (since 76m), 1379 in (since 10d); 10 >>> remapped pgs >>> flags >>> nobackfill,norebalance,norecover,noscrub,nodeep-scrub,nosnaptrim >>> >>> data: >>> volumes: 1/2 healthy, 1 recovering; 1 damaged >>> pools: 7 pools, 2177 pgs >>> objects: 3.24G objects, 6.7 PiB >>> usage: 8.6 PiB used, 14 PiB / 23 PiB avail >>> pgs: 11785954/27384310061 objects misplaced (0.043%) >>> 2167 active+clean >>> 6 active+remapped+backfilling >>> 4 active+remapped+backfill_wait >>> >>> ceph_backup - 0 clients >>> =========== >>> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS >>> 0 failed >>> POOL TYPE USED AVAIL >>> mds_backup_fs metadata 1174G 3071G >>> ec82_primary_fs_data data 0 3071G >>> ec82pool data 8085T 4738T >>> ceph_archive - 2 clients >>> ============ >>> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS >>> 0 active pebbles-s4 Reqs: 0 /s 13.4k 7105 118 2 >>> POOL TYPE USED AVAIL >>> mds_archive_fs metadata 5184M 3071G >>> ec83_primary_fs_data data 0 3071G >>> ec83pool data 138T 4307T >>> STANDBY MDS >>> pebbles-s2 >>> pebbles-s3 >>> pebbles-s1 >>> MDS version: ceph version 17.2.7 >>> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) >>> root@pebbles-s2 11:55 [~]: ceph fs dump >>> e2643889 >>> enable_multiple, ever_enabled_multiple: 1,1 >>> default compat: compat={},rocompat={},incompat={1=base v0.20,2=client >>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate >>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no >>> anchor table,9=file layout v2,10=snaprealm v2} >>> legacy client fscid: 1 >>> >>> Filesystem 'ceph_backup' (1) >>> fs_name ceph_backup >>> epoch 2643888 >>> flags 12 joinable allow_snaps allow_multimds_snaps >>> created 2023-05-19T12:52:36.302135+0100 >>> modified 2024-07-08T11:17:55.437861+0100 >>> tableserver 0 >>> root 0 >>> session_timeout 60 >>> session_autoclose 300 >>> max_file_size 109934182400000 >>> required_client_features {} >>> last_failure 0 >>> last_failure_osd_epoch 494515 >>> compat compat={},rocompat={},incompat={1=base v0.20,2=client >>> writeable ranges,3=default file layouts on dirs,4=dir inode in separate >>> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses >>> inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} >>> max_mds 1 >>> in 0 >>> up {} >>> failed >>> damaged 0 >>> stopped >>> data_pools [6,3] >>> metadata_pool 2 >>> inline_data disabled >>> balancer >>> standby_count_wanted 1 >>> >>> >>> Kindest regards, >>> >>> Ivan >>> On 28/06/2024 15:17, Dhairya Parmar wrote: >>> >>> CAUTION: This email originated from outside of the LMB: >>> *.-dparmar@xxxxxxxxxx-.* >>> Do not click links or open attachments unless you recognize the sender >>> and know the content is safe. >>> If you think this is a phishing email, please forward it to >>> phishing@xxxxxxxxxxxxxxxxx >>> >>> >>> -- >>> >>> >>> On Fri, Jun 28, 2024 at 6:02 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> >>> wrote: >>> >>>> Hi Dhairya, >>>> >>>> I would be more than happy to share our corrupted journal. Has the host >>>> key changed for drop.ceph.com? The fingerprint I'm being sent is >>>> 7T6dSMcUUa5refV147WEZR99UgW8Y1qYEXZr8ppvog4 which is different to the one >>>> in our /usr/share/ceph/known_hosts_drop.ceph.com. >>>> >>> Ah, strange. Let me get in touch with folks who might know about this, >>> will revert back to you ASAP >>> >>>> Thank you for your advice as well. We've reset our MDS' journal and are >>>> currently in the process of a full filesystem scrub which understandably is >>>> taking quite a bit of time but seems to be progressing through the objects >>>> fine. >>>> >>> YAY! >>> >>>> Thank you ever so much for all your help and please do feel free to >>>> follow up with us if you would like any further details about our crash! >>>> >>> Glad to hear it went well, this bug is being worked on with high >>> priority and once the patch is ready, it will be backported. >>> >>> The root cause of this issue is the `nowsync` (async dirops) being >>> enabled by default with kclient [0]. This feature allows asynchronous >>> creation and deletion of files, optimizing performance by avoiding >>> round-trip latency for these system calls. However, in very rare cases >>> (like yours :D), it can affect the system's consistency and stability hence >>> if this kind of optimization is not a priority for your workload, I >>> recommend turning it off by switching the mount points to `wsync` and also >>> set the MDS config `mds_client_delegate_inos_pct` to `0` so that you don't >>> end up in this situation again (until the bug fix arrives :)). >>> >>> [0] >>> https://github.com/ceph/ceph-client/commit/f7a67b463fb83a4b9b11ceaa8ec4950b8fb7f902 >>> >>>> Kindest regards, >>>> >>>> Ivan >>>> On 27/06/2024 12:39, Dhairya Parmar wrote: >>>> >>>> CAUTION: This email originated from outside of the LMB: >>>> *.-dparmar@xxxxxxxxxx-.* >>>> Do not click links or open attachments unless you recognize the sender >>>> and know the content is safe. >>>> If you think this is a phishing email, please forward it to >>>> phishing@xxxxxxxxxxxxxxxxx >>>> >>>> >>>> -- >>>> Hi Ivan, >>>> >>>> The solution (which has been successful for us in the past) is to reset >>>> the journal. This would bring the fs back online and return the MDSes to a >>>> stable state, but some data would be lost—the data in the journal that >>>> hasn't been flushed to the backing store would be gone. Therefore, you >>>> should try to flush out as much journal data as possible before resetting >>>> the journal. >>>> >>>> Here are the steps for this entire process: >>>> >>>> 1) Bring the FS offline >>>> $ ceph fs fail <fs_name> >>>> >>>> 2) Recover dentries from journal (run it with every MDS Rank) >>>> $ cephfs-journal-tool --rank=<fs_name>:<mds-rank> event >>>> recover_dentries summary >>>> >>>> 3) Reset the journal (again with every MDS Rank) >>>> $ cephfs-journal-tool --rank=<fs_name>:<mds-rank> journal reset >>>> >>>> 4) Bring the FS online >>>> $ cephfs fs set <fs_name> joinable true >>>> >>>> 5) Restart the MDSes >>>> >>>> 6) Perform scrub to ensure consistency of fs >>>> $ ceph tell mds.<fs_name>:0 scrub start <path> [scrubopts] [tag] >>>> # you could try a recursive scrub maybe `ceph tell mds.<fs_name>:0 >>>> scrub start / recursive` >>>> >>>> Some important notes to keep in mind: >>>> * Recovering dentries will take time (generally, rank 0 is the most >>>> time-consuming, but the rest should be quick). >>>> * cephfs-journal-tool and metadata OSDs are bound to use a significant >>>> CPU percentage. This is because cephfs-journal-tool has to swig the journal >>>> data and flush it out to the backing store, which also makes the metadata >>>> operations go rampant, resulting in OSDs taking a significant percentage of >>>> CPU. >>>> >>>> Do let me know how this goes. >>>> >>>> On Thu, Jun 27, 2024 at 3:44 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> >>>> wrote: >>>> >>>>> Hi Dhairya, >>>>> >>>>> We can induce the crash by simply restarting the MDS and the crash >>>>> seems to happen when an MDS goes from up:standby to up:replay. The MDS >>>>> works through a few files in the log before eventually crashing where I've >>>>> included the logs for this here (this is after I imported the backed up >>>>> journal which I hope was successful but please let me know if you suspect >>>>> it wasn't!): >>>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.mds_restart_crash.log >>>>> >>>>> With respect to the client logs, are you referring to the clients who >>>>> are writing to the filesystem? We don't typically run them in any sort of >>>>> debug mode and we have quite a few machines running our backup system but >>>>> we can look an hour or so before the first MDS crash (though I don't know >>>>> if this is when the de-sync occurred). Here are some MDS logs with regards >>>>> to the initial crash on Saturday morning though which may be helpful: >>>>> >>>>> -59> 2024-06-22T05:41:43.090+0100 7f184ce82700 10 monclient: tick >>>>> -58> 2024-06-22T05:41:43.090+0100 7f184ce82700 10 monclient: >>>>> _check_auth_rotating have uptodate secrets (they expire after >>>>> 2024-06-22T05:41:13.091556+0100) >>>>> -57> 2024-06-22T05:41:43.208+0100 7f184de84700 1 mds.pebbles-s2 >>>>> Updating MDS map to version 2529650 from mon.3 >>>>> -56> 2024-06-22T05:41:43.208+0100 7f184de84700 4 mds.0.purge_queue >>>>> operator(): data pool 6 not found in OSDMap >>>>> -55> 2024-06-22T05:41:43.208+0100 7f184de84700 4 mds.0.purge_queue >>>>> operator(): data pool 3 not found in OSDMap >>>>> -54> 2024-06-22T05:41:43.209+0100 7f184de84700 5 >>>>> asok(0x5592e7968000) register_command objecter_requests hook 0x5592e78f8800 >>>>> -53> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient: >>>>> _renew_subs >>>>> -52> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient: >>>>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0 >>>>> -51> 2024-06-22T05:41:43.209+0100 7f184de84700 10 >>>>> log_channel(cluster) update_config to_monitors: true to_syslog: false >>>>> syslog_facility: prio: info to_graylog: false graylog_host: 127.0.0.1 >>>>> graylog_port: 12201) >>>>> -50> 2024-06-22T05:41:43.209+0100 7f184de84700 4 mds.0.purge_queue >>>>> operator(): data pool 6 not found in OSDMap >>>>> -49> 2024-06-22T05:41:43.209+0100 7f184de84700 4 mds.0.purge_queue >>>>> operator(): data pool 3 not found in OSDMap >>>>> -48> 2024-06-22T05:41:43.209+0100 7f184de84700 4 mds.0.0 >>>>> apply_blocklist: killed 0, blocklisted sessions (0 blocklist entries, 0) >>>>> -47> 2024-06-22T05:41:43.209+0100 7f184de84700 1 mds.0.2529650 >>>>> handle_mds_map i am now mds.0.2529650 >>>>> -46> 2024-06-22T05:41:43.209+0100 7f184de84700 1 mds.0.2529650 >>>>> handle_mds_map state change up:standby --> up:replay >>>>> -45> 2024-06-22T05:41:43.209+0100 7f184de84700 5 >>>>> mds.beacon.pebbles-s2 set_want_state: up:standby -> up:replay >>>>> -44> 2024-06-22T05:41:43.209+0100 7f184de84700 1 mds.0.2529650 >>>>> replay_start >>>>> -43> 2024-06-22T05:41:43.209+0100 7f184de84700 1 mds.0.2529650 >>>>> waiting for osdmap 473739 (which blocklists prior instance) >>>>> -42> 2024-06-22T05:41:43.209+0100 7f184de84700 10 monclient: >>>>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0 >>>>> -41> 2024-06-22T05:41:43.209+0100 7f1849e7c700 2 mds.0.cache >>>>> Memory usage: total 299012, rss 37624, heap 182556, baseline 182556, 0 / 0 >>>>> inodes have caps, 0 caps, 0 caps per inode >>>>> -40> 2024-06-22T05:41:43.224+0100 7f184de84700 10 monclient: >>>>> _renew_subs >>>>> -39> 2024-06-22T05:41:43.224+0100 7f184de84700 10 monclient: >>>>> _send_mon_message to mon.pebbles-s4 at v2:10.1.5.134:3300/0 >>>>> -38> 2024-06-22T05:41:43.224+0100 7f184de84700 10 monclient: >>>>> handle_get_version_reply finishing 1 version 473739 >>>>> -37> 2024-06-22T05:41:43.224+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 0: opening inotable >>>>> -36> 2024-06-22T05:41:43.224+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 0: opening sessionmap >>>>> -35> 2024-06-22T05:41:43.224+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 0: opening mds log >>>>> -34> 2024-06-22T05:41:43.224+0100 7f1847e78700 5 mds.0.log open >>>>> discovering log bounds >>>>> -33> 2024-06-22T05:41:43.224+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 0: opening purge queue (async) >>>>> -32> 2024-06-22T05:41:43.224+0100 7f1847e78700 4 mds.0.purge_queue >>>>> open: opening >>>>> -31> 2024-06-22T05:41:43.224+0100 7f1847e78700 1 >>>>> mds.0.journaler.pq(ro) recover start >>>>> -30> 2024-06-22T05:41:43.224+0100 7f1847e78700 1 >>>>> mds.0.journaler.pq(ro) read_head >>>>> -29> 2024-06-22T05:41:43.224+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 0: loading open file table (async) >>>>> -28> 2024-06-22T05:41:43.224+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 0: opening snap table >>>>> -27> 2024-06-22T05:41:43.224+0100 7f1847677700 4 >>>>> mds.0.journalpointer Reading journal pointer '400.00000000' >>>>> -26> 2024-06-22T05:41:43.224+0100 7f1850689700 10 monclient: >>>>> get_auth_request con 0x5592e8987000 auth_method 0 >>>>> -25> 2024-06-22T05:41:43.225+0100 7f1850e8a700 10 monclient: >>>>> get_auth_request con 0x5592e8987c00 auth_method 0 >>>>> -24> 2024-06-22T05:41:43.252+0100 7f1848e7a700 1 >>>>> mds.0.journaler.pq(ro) _finish_read_head loghead(trim 231160676352, expire >>>>> 231163662875, write 231163662875, stream_format 1). probing for end of log >>>>> (from 231163662875)... >>>>> -23> 2024-06-22T05:41:43.252+0100 7f1848e7a700 1 >>>>> mds.0.journaler.pq(ro) probing for end of the log >>>>> -22> 2024-06-22T05:41:43.252+0100 7f1847677700 1 >>>>> mds.0.journaler.mdlog(ro) recover start >>>>> -21> 2024-06-22T05:41:43.252+0100 7f1847677700 1 >>>>> mds.0.journaler.mdlog(ro) read_head >>>>> -20> 2024-06-22T05:41:43.252+0100 7f1847677700 4 mds.0.log Waiting >>>>> for journal 0x200 to recover... >>>>> -19> 2024-06-22T05:41:43.252+0100 7f1850689700 10 monclient: >>>>> get_auth_request con 0x5592e8bc6000 auth_method 0 >>>>> -18> 2024-06-22T05:41:43.253+0100 7f185168b700 10 monclient: >>>>> get_auth_request con 0x5592e8bc6800 auth_method 0 >>>>> -17> 2024-06-22T05:41:43.257+0100 7f1847e78700 1 >>>>> mds.0.journaler.mdlog(ro) _finish_read_head loghead(trim 90131453181952, >>>>> expire 90131465778558, write 90132009715463, stream_format 1). probing for >>>>> end of log (from 90132009715463)... >>>>> -16> 2024-06-22T05:41:43.257+0100 7f1847e78700 1 >>>>> mds.0.journaler.mdlog(ro) probing for end of the log >>>>> -15> 2024-06-22T05:41:43.257+0100 7f1847e78700 1 >>>>> mds.0.journaler.mdlog(ro) _finish_probe_end write_pos = 90132019384791 >>>>> (header had 90132009715463). recovered. >>>>> -14> 2024-06-22T05:41:43.257+0100 7f1847677700 4 mds.0.log Journal >>>>> 0x200 recovered. >>>>> -13> 2024-06-22T05:41:43.257+0100 7f1847677700 4 mds.0.log >>>>> Recovered journal 0x200 in format 1 >>>>> -12> 2024-06-22T05:41:43.273+0100 7f1848e7a700 1 >>>>> mds.0.journaler.pq(ro) _finish_probe_end write_pos = 231163662875 (header >>>>> had 231163662875). recovered. >>>>> -11> 2024-06-22T05:41:43.273+0100 7f1848e7a700 4 mds.0.purge_queue >>>>> operator(): open complete >>>>> -10> 2024-06-22T05:41:43.273+0100 7f1848e7a700 1 >>>>> mds.0.journaler.pq(ro) set_writeable >>>>> -9> 2024-06-22T05:41:43.441+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 1: loading/discovering base inodes >>>>> -8> 2024-06-22T05:41:43.441+0100 7f1847e78700 0 mds.0.cache >>>>> creating system inode with ino:0x100 >>>>> -7> 2024-06-22T05:41:43.442+0100 7f1847e78700 0 mds.0.cache >>>>> creating system inode with ino:0x1 >>>>> -6> 2024-06-22T05:41:43.442+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 2: replaying mds log >>>>> -5> 2024-06-22T05:41:43.442+0100 7f1847e78700 2 mds.0.2529650 >>>>> Booting: 2: waiting for purge queue recovered >>>>> -4> 2024-06-22T05:41:44.090+0100 7f184ce82700 10 monclient: tick >>>>> -3> 2024-06-22T05:41:44.090+0100 7f184ce82700 10 monclient: >>>>> _check_auth_rotating have uptodate secrets (they expire after >>>>> 2024-06-22T05:41:14.091638+0100) >>>>> -2> 2024-06-22T05:41:44.210+0100 7f1849e7c700 2 mds.0.cache >>>>> Memory usage: total 588368, rss 308304, heap 207132, baseline 182556, 0 / >>>>> 15149 inodes have caps, 0 caps, 0 caps per inode >>>>> -1> 2024-06-22T05:41:44.642+0100 7f1846675700 -1 >>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h: >>>>> In function 'void interval_set<T, C>::erase(T, T, std::function<bool(T, >>>>> T)>) [with T = inodeno_t; C = std::map]' thread 7f1846675700 time >>>>> 2024-06-22T05:41:44.643146+0100 >>>>> >>>>> ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy >>>>> (stable) >>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>> const*)+0x135) [0x7f18568b64a3] >>>>> 2: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f18568b6669] >>>>> 3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, >>>>> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x5592e5027885] >>>>> 4: (EMetaBlob::replay(MDSRank*, LogSegment*, int, >>>>> MDPeerUpdate*)+0x4377) [0x5592e532c7b7] >>>>> 5: (EUpdate::replay(MDSRank*)+0x61) [0x5592e5330bd1] >>>>> 6: (MDLog::_replay_thread()+0x7bb) [0x5592e52b754b] >>>>> 7: (MDLog::ReplayThread::entry()+0x11) [0x5592e4f6a041] >>>>> 8: /lib64/libpthread.so.0(+0x81ca) [0x7f18558a41ca] >>>>> 9: clone() >>>>> >>>>> 0> 2024-06-22T05:41:44.643+0100 7f1846675700 -1 *** Caught signal >>>>> (Aborted) ** >>>>> in thread 7f1846675700 thread_name:md_log_replay >>>>> >>>>> ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy >>>>> (stable) >>>>> 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f18558aecf0] >>>>> 2: gsignal() >>>>> 3: abort() >>>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>>>> const*)+0x18f) [0x7f18568b64fd] >>>>> 5: /usr/lib64/ceph/libceph-common.so.2(+0x269669) [0x7f18568b6669] >>>>> 6: (interval_set<inodeno_t, std::map>::erase(inodeno_t, inodeno_t, >>>>> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) [0x5592e5027885] >>>>> 7: (EMetaBlob::replay(MDSRank*, LogSegment*, int, >>>>> MDPeerUpdate*)+0x4377) [0x5592e532c7b7] >>>>> 8: (EUpdate::replay(MDSRank*)+0x61) [0x5592e5330bd1] >>>>> 9: (MDLog::_replay_thread()+0x7bb) [0x5592e52b754b] >>>>> 10: (MDLog::ReplayThread::entry()+0x11) [0x5592e4f6a041] >>>>> 11: /lib64/libpthread.so.0(+0x81ca) [0x7f18558a41ca] >>>>> 12: clone() >>>>> >>>>> We have a relatively low debug setting normally so I don't think many >>>>> details of the initial crash were captured unfortunately and the MDS logs >>>>> before the above (i.e. "-60" and older) are just beacon messages and >>>>> _check_auth_rotating checks. >>>>> >>>>> I was wondering whether you have any recommendations in terms of what >>>>> actions we could take to bring our filesystem back into a working state >>>>> short of rebuilding the entire metadata pool? We are quite keen to bring >>>>> our backup back into service urgently as we currently do not have any >>>>> accessible backups for our Ceph clusters. >>>>> >>>>> Kindest regards, >>>>> >>>>> Ivan >>>>> On 25/06/2024 19:18, Dhairya Parmar wrote: >>>>> >>>>> CAUTION: This email originated from outside of the LMB: >>>>> *.-dparmar@xxxxxxxxxx-.* >>>>> Do not click links or open attachments unless you recognize the sender >>>>> and know the content is safe. >>>>> If you think this is a phishing email, please forward it to >>>>> phishing@xxxxxxxxxxxxxxxxx >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> On Tue, Jun 25, 2024 at 6:38 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> >>>>> wrote: >>>>> >>>>>> Hi Dhairya, >>>>>> >>>>>> Thank you for your rapid reply. I tried recovering the dentries for >>>>>> the file just before the crash I mentioned before and then splicing the >>>>>> transactions from the journal which seemed to remove that issue for that >>>>>> inode but resulted in the MDS crashing on the next inode in the >>>>>> journal when performing replay. >>>>>> >>>>> The MDS delegates a range of preallocated inodes (in form of a set - >>>>> interval_set<inodeno_t> preallocated_inos) to the clients, so it can be one >>>>> inode that is untracked or some inodes from the range or in worst case >>>>> scenario - ALL, and this is something that even the `cephfs-journal-tool` >>>>> would not be able to tell (since we're talking about MDS internals which >>>>> aren't exposed to such tools). That is the reason why you see "MDS crashing >>>>> on the next inode in the journal when performing replay". >>>>> >>>>> An option could be to expose the inode set to some tool or asok cmd to >>>>> identify such inodes ranges, which needs to be discussed. For now, we're >>>>> trying to address this in [0], you can follow the discussion there. >>>>> >>>>> [0] https://tracker.ceph.com/issues/66251 >>>>> >>>>>> Removing all the transactions involving the directory housing the >>>>>> files that seemed to cause these crashes from the journal only caused the >>>>>> MDS to fail to even start replay. >>>>>> >>>>> I've rolled back our journal to our original version when the crash >>>>>> first happened and the entire MDS log for the crash can be found here: >>>>>> https://www.mrc-lmb.cam.ac.uk/scicomp/data/uploads/ceph/ceph-mds.pebbles-s3.flush_journal.log-25-06-24 >>>>>> >>>>> Awesome, this would help us a ton. Apart from this, would it be >>>>> possible to send us client logs? >>>>> >>>>>> Please let us know if you would like any other logs file as we can >>>>>> easily induce this crash. >>>>>> >>>>> Since you can easily induce the crash, can you share the reproducer >>>>> please i.e. what all action you take in order to hit this? >>>>> >>>>>> Kindest regards, >>>>>> >>>>>> Ivan >>>>>> On 25/06/2024 09:58, Dhairya Parmar wrote: >>>>>> >>>>>> CAUTION: This email originated from outside of the LMB: >>>>>> *.-dparmar@xxxxxxxxxx-.* >>>>>> Do not click links or open attachments unless you recognize the >>>>>> sender and know the content is safe. >>>>>> If you think this is a phishing email, please forward it to >>>>>> phishing@xxxxxxxxxxxxxxxxx >>>>>> >>>>>> >>>>>> -- >>>>>> Hi Ivan, >>>>>> >>>>>> This looks to be similar to the issue [0] that we're already >>>>>> addressing at [1]. So basically there is some out-of-sync event that led >>>>>> the client to make use of the inodes that MDS wasn't aware of/isn't >>>>>> tracking and hence the crash. It'd be really helpful if you can provide us >>>>>> more logs. >>>>>> >>>>>> CC @Rishabh Dave <ridave@xxxxxxxxxx> @Venky Shankar >>>>>> <vshankar@xxxxxxxxxx> @Patrick Donnelly <pdonnell@xxxxxxxxxx> @Xiubo >>>>>> Li <xiubli@xxxxxxxxxx> >>>>>> >>>>>> [0] https://tracker.ceph.com/issues/61009 >>>>>> [1] https://tracker.ceph.com/issues/66251 >>>>>> -- >>>>>> *Dhairya Parmar* >>>>>> >>>>>> Associate Software Engineer, CephFS >>>>>> >>>>>> <https://www.redhat.com/>IBM, Inc. >>>>>> >>>>>> On Mon, Jun 24, 2024 at 8:54 PM Ivan Clayson <ivan@xxxxxxxxxxxxxxxxx> >>>>>> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We have been experiencing a serious issue with our CephFS backup >>>>>>> cluster >>>>>>> running quincy (version 17.2.7) on a RHEL8-derivative Linux kernel >>>>>>> (Alma8.9, 4.18.0-513.9.1 kernel) where our MDSes for our filesystem >>>>>>> are >>>>>>> constantly in a "replay" or "replay(laggy)" state and keep crashing. >>>>>>> >>>>>>> We have a single MDS filesystem called "ceph_backup" with 2 standby >>>>>>> MDSes along with a 2nd unused filesystem "ceph_archive" (this holds >>>>>>> little to no data) where we are using our "ceph_backup" filesystem >>>>>>> to >>>>>>> backup our data and this is the one which is currently broken. The >>>>>>> Ceph >>>>>>> health outputs currently are: >>>>>>> >>>>>>> root@pebbles-s1 14:05 [~]: ceph -s >>>>>>> cluster: >>>>>>> id: e3f7535e-d35f-4a5d-88f0-a1e97abcd631 >>>>>>> health: HEALTH_WARN >>>>>>> 1 filesystem is degraded >>>>>>> insufficient standby MDS daemons available >>>>>>> 1319 pgs not deep-scrubbed in time >>>>>>> 1054 pgs not scrubbed in time >>>>>>> >>>>>>> services: >>>>>>> mon: 4 daemons, quorum >>>>>>> pebbles-s1,pebbles-s2,pebbles-s3,pebbles-s4 (age 36m) >>>>>>> mgr: pebbles-s2(active, since 36m), standbys: pebbles-s4, >>>>>>> pebbles-s3, pebbles-s1 >>>>>>> mds: 2/2 daemons up >>>>>>> osd: 1380 osds: 1380 up (since 29m), 1379 in (since 3d); 37 >>>>>>> remapped pgs >>>>>>> >>>>>>> data: >>>>>>> volumes: 1/2 healthy, 1 recovering >>>>>>> pools: 7 pools, 2177 pgs >>>>>>> objects: 3.55G objects, 7.0 PiB >>>>>>> usage: 8.9 PiB used, 14 PiB / 23 PiB avail >>>>>>> pgs: 83133528/30006841533 objects misplaced (0.277%) >>>>>>> 2090 active+clean >>>>>>> 47 active+clean+scrubbing+deep >>>>>>> 29 active+remapped+backfilling >>>>>>> 8 active+remapped+backfill_wait >>>>>>> 2 active+clean+scrubbing >>>>>>> 1 active+clean+snaptrim >>>>>>> >>>>>>> io: >>>>>>> recovery: 1.9 GiB/s, 719 objects/s >>>>>>> >>>>>>> root@pebbles-s1 14:09 [~]: ceph fs status >>>>>>> ceph_backup - 0 clients >>>>>>> =========== >>>>>>> RANK STATE MDS ACTIVITY DNS INOS DIRS >>>>>>> CAPS >>>>>>> 0 replay(laggy) pebbles-s3 0 0 0 0 >>>>>>> POOL TYPE USED AVAIL >>>>>>> mds_backup_fs metadata 1255G 2780G >>>>>>> ec82_primary_fs_data data 0 2780G >>>>>>> ec82pool data 8442T 3044T >>>>>>> ceph_archive - 2 clients >>>>>>> ============ >>>>>>> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS >>>>>>> 0 active pebbles-s2 Reqs: 0 /s 13.4k 7105 118 2 >>>>>>> POOL TYPE USED AVAIL >>>>>>> mds_archive_fs metadata 5184M 2780G >>>>>>> ec83_primary_fs_data data 0 2780G >>>>>>> ec83pool data 138T 2767T >>>>>>> MDS version: ceph version 17.2.7 >>>>>>> (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) >>>>>>> root@pebbles-s1 14:09 [~]: ceph health detail | head >>>>>>> HEALTH_WARN 1 filesystem is degraded; insufficient standby MDS >>>>>>> daemons available; 1319 pgs not deep-scrubbed in time; 1054 pgs >>>>>>> not >>>>>>> scrubbed in time >>>>>>> [WRN] FS_DEGRADED: 1 filesystem is degraded >>>>>>> fs ceph_backup is degraded >>>>>>> [WRN] MDS_INSUFFICIENT_STANDBY: insufficient standby MDS daemons >>>>>>> available >>>>>>> have 0; want 1 more >>>>>>> >>>>>>> When our cluster first ran after a reboot, Ceph ran through the 2 >>>>>>> standby MDSes, crashing them all, until it reached the final MDS and >>>>>>> is >>>>>>> now stuck in this "replay(laggy)" state. Putting our MDSes into >>>>>>> debugging mode, we can see that this MDS crashed when replaying the >>>>>>> journal for a particular inode (this is the same for all the MDSes >>>>>>> and >>>>>>> they all crash on the same object): >>>>>>> >>>>>>> ... >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay for [521,head] had [inode 0x1005ba89481 >>>>>>> [...539,head] >>>>>>> >>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/ >>>>>>> auth fragtree_t(*^2 00*^3 00000*^ >>>>>>> 4 00001*^3 00010*^4 00011*^4 00100*^4 00101*^4 00110*^4 00111*^4 >>>>>>> 01*^3 01000*^4 01001*^3 01010*^4 01011*^3 01100*^4 01101*^4 >>>>>>> 01110*^4 >>>>>>> 01111*^4 10*^3 10000*^4 10001*^4 10010*^4 10011*^4 10100*^4 >>>>>>> 10101*^3 >>>>>>> 10110*^4 10111*^4 11*^6) v10880645 f(v0 m2024-06-22 >>>>>>> T05:41:10.213700+0100 1281276=1281276+0) n(v12 >>>>>>> rc2024-06-22T05:41:10.213700+0100 b1348251683896 >>>>>>> 1281277=1281276+1) >>>>>>> old_inodes=8 (iversion lock) | dirfrag=416 dirty=1 >>>>>>> 0x55770a2bdb80] >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay dir 0x1005ba89481.011011000* >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay updated dir [dir 0x1005ba89481.011011000* >>>>>>> >>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/ >>>>>>> [2,head] auth v=436385 cv=0/0 state=107374182 >>>>>>> 4 f(v0 m2024-06-22T05:41:10.213700+0100 2502=2502+0) n(v12 >>>>>>> rc2024-06-22T05:41:10.213700+0100 b2120744220 2502=2502+0) >>>>>>> hs=32+33,ss=0+0 dirty=65 | child=1 0x55770ebcda80] >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay added (full) [dentry >>>>>>> >>>>>>> #0x1/cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fracti >>>>>>> ons_ave_Z124.mrc.teberet7.partial [539,head] auth NULL (dversion >>>>>>> lock) v=436384 ino=(nil) state=1610612800|bottomlru | dirty=1 >>>>>>> 0x557710444500] >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay added [inode 0x1005cd4fe35 [539,head] >>>>>>> >>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_ >>>>>>> 005006_fractions_ave_Z124.mrc.teberet7.partial auth v436384 s=0 >>>>>>> n(v0 >>>>>>> 1=1+0) (iversion lock) cr={99995144=0-4194304@538} >>>>>>> 0x557710438680] >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 >>>>>>> mds.0.cache.ino(0x1005cd4fe35) mark_dirty_parent >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay noting opened inode [inode 0x1005cd4fe35 >>>>>>> [539,head] >>>>>>> >>>>>>> /cephfs-users/afellows/Ferdos/20210625_real_DDFHFKLMT_KriosIII_K3/cryolo/test_micrographs/FoilHole_27649821_Data_27626128_2762 >>>>>>> 6130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial auth >>>>>>> v436384 DIRTYPARENT s=0 n(v0 1=1+0) (iversion lock) >>>>>>> cr={99995144=0-4194304@538} | dirtyparent=1 dirty=1 >>>>>>> 0x557710438680] >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay inotable tablev 3112837 <= table 3112837 >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 10 mds.0.journal >>>>>>> EMetaBlob.replay sessionmap v 1560540883, table 1560540882 >>>>>>> prealloc >>>>>>> [] used 0x1005cd4fe35 >>>>>>> 2024-06-24T13:44:55.563+0100 7f8811c40700 -1 >>>>>>> >>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h: >>>>>>> I >>>>>>> n function 'void interval_set<T, C>::erase(T, T, >>>>>>> std::function<bool(T, T)>) [with T = inodeno_t; C = std::map]' >>>>>>> thread 7f8811c40700 time 2024-06-24T13:44:55.564315+0100 >>>>>>> >>>>>>> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/include/interval_set.h: >>>>>>> 568: FAILED ceph_assert(p->first <= start) >>>>>>> >>>>>>> ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) >>>>>>> quincy (stable) >>>>>>> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, >>>>>>> char >>>>>>> const*)+0x135) [0x7f8821e814a3] >>>>>>> 2: /usr/lib64/ceph/libceph-common.so.2(+0x269669) >>>>>>> [0x7f8821e81669] >>>>>>> 3: (interval_set<inodeno_t, std::map>::erase(inodeno_t, >>>>>>> inodeno_t, >>>>>>> std::function<bool (inodeno_t, inodeno_t)>)+0x2e5) >>>>>>> [0x5576f9bb2885] >>>>>>> 4: (EMetaBlob::replay(MDSRank*, LogSegment*, int, >>>>>>> MDPeerUpdate*)+0x4377) [0x5576f9eb77b7] >>>>>>> 5: (EUpdate::replay(MDSRank*)+0x61) [0x5576f9ebbbd1] >>>>>>> 6: (MDLog::_replay_thread()+0x7bb) [0x5576f9e4254b] >>>>>>> 7: (MDLog::ReplayThread::entry()+0x11) [0x5576f9af5041] >>>>>>> 8: /lib64/libpthread.so.0(+0x81ca) [0x7f8820e6f1ca] >>>>>>> 9: clone() >>>>>>> >>>>>>> I've only included a short section of the crash (this is the first >>>>>>> trace >>>>>>> in the log with regards to the crash with a 10/20 debug_mds option). >>>>>>> We >>>>>>> tried deleting the 0x1005cd4fe35 object from the object store using >>>>>>> the >>>>>>> "rados" command but this did not allow our MDS to successfully >>>>>>> replay. >>>>>>> >>>>>>> From my understanding the journal seems okay as we didn't run out >>>>>>> of >>>>>>> space for example on our metadata pool and "cephfs-journal-tool >>>>>>> journal >>>>>>> inspect" doesn't seem to think there is any damage: >>>>>>> >>>>>>> root@pebbles-s1 13:58 [~]: cephfs-journal-tool >>>>>>> --rank=ceph_backup:0 >>>>>>> journal inspect >>>>>>> Overall journal integrity: OK >>>>>>> root@pebbles-s1 14:04 [~]: cephfs-journal-tool >>>>>>> --rank=ceph_backup:0 >>>>>>> event get --inode 1101069090357 summary >>>>>>> Events by type: >>>>>>> OPEN: 1 >>>>>>> UPDATE: 3 >>>>>>> Errors: 0 >>>>>>> root@pebbles-s1 14:05 [~]: cephfs-journal-tool >>>>>>> --rank=ceph_backup:0 >>>>>>> event get --inode 1101069090357 list >>>>>>> 2024-06-22T05:41:10.214635+0100 0x51f97d4cfe35 UPDATE: (openc) >>>>>>> >>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial >>>>>>> 2024-06-22T05:41:11.203312+0100 0x51f97d59c848 UPDATE: >>>>>>> (check_inode_max_size) >>>>>>> >>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial >>>>>>> >>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial >>>>>>> 2024-06-22T05:41:15.484871+0100 0x51f97e7344cc OPEN: () >>>>>>> >>>>>>> FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial >>>>>>> 2024-06-22T05:41:15.484921+0100 0x51f97e73493b UPDATE: (rename) >>>>>>> >>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc.teberet7.partial >>>>>>> >>>>>>> test_micrographs/FoilHole_27649821_Data_27626128_27626130_20210628_005006_fractions_ave_Z124.mrc >>>>>>> >>>>>>> I was wondering whether anyone had any advice for us on how we >>>>>>> should >>>>>>> proceed forward? We were thinking about manually applying these >>>>>>> events >>>>>>> (via "event apply") where failing that we could erase this >>>>>>> problematic >>>>>>> event with "cephfs-journal-tool --rank=ceph_backup:0 event splice >>>>>>> --inode 1101069090357". Is this a good idea? We would rather not >>>>>>> rebuild >>>>>>> the entire metadata pool if we could avoid it (once was enough for >>>>>>> us) >>>>>>> as this cluster has ~9 PB of data on it. >>>>>>> >>>>>>> Kindest regards, >>>>>>> >>>>>>> Ivan Clayson >>>>>>> >>>>>>> -- >>>>>>> Ivan Clayson >>>>>>> ----------------- >>>>>>> Scientific Computing Officer >>>>>>> Room 2N249 >>>>>>> Structural Studies >>>>>>> MRC Laboratory of Molecular Biology >>>>>>> Francis Crick Ave, Cambridge >>>>>>> CB2 0QH >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>> >>>>>> -- >>>>>> Ivan Clayson >>>>>> ----------------- >>>>>> Scientific Computing Officer >>>>>> Room 2N249 >>>>>> Structural Studies >>>>>> MRC Laboratory of Molecular Biology >>>>>> Francis Crick Ave, Cambridge >>>>>> CB2 0QH >>>>>> >>>>>> -- >>>>> Ivan Clayson >>>>> ----------------- >>>>> Scientific Computing Officer >>>>> Room 2N249 >>>>> Structural Studies >>>>> MRC Laboratory of Molecular Biology >>>>> Francis Crick Ave, Cambridge >>>>> CB2 0QH >>>>> >>>>> -- >>>> Ivan Clayson >>>> ----------------- >>>> Scientific Computing Officer >>>> Room 2N249 >>>> Structural Studies >>>> MRC Laboratory of Molecular Biology >>>> Francis Crick Ave, Cambridge >>>> CB2 0QH >>>> >>>> -- >>> Ivan Clayson >>> ----------------- >>> Scientific Computing Officer >>> Room 2N249 >>> Structural Studies >>> MRC Laboratory of Molecular Biology >>> Francis Crick Ave, Cambridge >>> CB2 0QH >>> >>> -- >> Ivan Clayson >> ----------------- >> Scientific Computing Officer >> Room 2N249 >> Structural Studies >> MRC Laboratory of Molecular Biology >> Francis Crick Ave, Cambridge >> CB2 0QH >> >> -- > Ivan Clayson > ----------------- > Scientific Computing Officer > Room 2N249 > Structural Studies > MRC Laboratory of Molecular Biology > Francis Crick Ave, Cambridge > CB2 0QH > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx