Re: Recover files from cephfs data pool

Rhian Resnick <xantho@xxxxxxxxxxxx> · Mon, 5 Nov 2018 20:21:44 -0500

Gotcha. Yah I think we are going continue the scanning to build a new metadata pool. I am making some progress on a script to extract files from the data store. Just need to find the exact format of the xattr's and the object hierarchy for large files. If I end up taking the script to the finish line this will be something I post for the community. So I am reading c source code at the moment to see what cephfs is doing. 

On Mon, Nov 5, 2018 at 8:10 PM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
With cppool you got bunch of useless zero-sized objects because unlike "export", cppool does not copy omap data which actually holds all the inodes info.I suggest truncating journals only for an effort of reducing downtime followed by immediate backup of available files to a fresh fs. After resetting journals the part of your fs covered by not flushed "UPDATE" entries *will* become inconsistent. MDS may start to occasionally segfault but it can be avoided by setting forced readonly mode (in this mode MDS journal will not flush so you will need extra disk space).
If you want to get the original fs recovered and fully functional - you need to somehow replay the journal (I'm unsure whether cephfs-data-scan tool operates on journal entries).

On 6.11.2018, at 03:43, Rhian Resnick <xantho@xxxxxxxxxxxx> wrote:

Workload is mixed. 
We ran a rados cpool to backup the metadata pool. 

So your thinking that truncating journal and purge queue (we are luminous) with a reset could bring us online missing just data from that day. (most when the issue started)

If so we could continue our scan into our recovery partition and give it a try tomorrow after discussions with our recovery team. 

On Mon, Nov 5, 2018 at 7:40 PM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
What was your recent workload? There are chances not to lose much if it was mostly read ops. If such, you must backup your metadata pool via "rados export" in order to preserve omap data, then try truncating journals (along with purge queue if supported by your ceph version), wiping session table, and resetting the fs.

On 6.11.2018, at 03:26, Rhian Resnick <xantho@xxxxxxxxxxxx> wrote:

That was our original plan. So we migrated to bigger disks and have space but recover dentry uses up all our memory (128 GB) and crashes out. 

On Mon, Nov 5, 2018 at 7:23 PM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
I had the same problem with multi-mds. I solved it by freeing up a little space on OSDs, doing "recover dentries", truncating the journal, and then "fs reset". After that I was able to revert to single-active MDS and kept on running for a year until it failed on 13.2.2 upgrade :))

On 6.11.2018, at 03:18, Rhian Resnick <xantho@xxxxxxxxxxxx> wrote:

Our metadata pool went from 700 MB to 1 TB in size in a few hours. Used all space on OSD and now 2 ranks report damage. The recovery tools on the journal fail as they run out of memory leaving us with the option of truncating the journal and loosing data or recovering using the scan tools. 
Any ideas on solutions are welcome. I posted all the logs and and cluster design previously but am happy to do so again. We are not desperate but we are hurting with this long downtime. 

On Mon, Nov 5, 2018 at 7:05 PM Sergey Malinin <hell@xxxxxxxxxxx> wrote:
What kind of damage have you had? Maybe it is worth trying to get MDS to start and backup valuable data instead of doing long running recovery?

On 6.11.2018, at 02:59, Rhian Resnick <xantho@xxxxxxxxxxxx> wrote:

Sounds like I get to have some fun tonight. 

On Mon, Nov 5, 2018, 6:39 PM Sergey Malinin <hell@xxxxxxxxxxx wrote:
inode linkage (i.e. folder hierarchy) and file names are stored in omap data of objects in metadata pool. You can write a script that would traverse through all the metadata pool to find out file names correspond to objects in data pool and fetch required files via 'rados get' command.

> On 6.11.2018, at 02:26, Sergey Malinin <hell@xxxxxxxxxxx> wrote:

> 

> Yes, 'rados -h'.

> 

> 

>> On 6.11.2018, at 02:25, Rhian Resnick <xantho@xxxxxxxxxxxx> wrote:

>> 

>> Does a tool exist to recover files from a cephfs data partition? We are rebuilding metadata but have a user who needs data asap.

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com