Re: Files lost after mds rebuild

Sage Weil <sage@xxxxxxxxxxx> · Tue, 20 Nov 2012 22:22:09 -0800 (PST)

On Wed, 21 Nov 2012, Drunkard Zhang wrote:
> 2012/11/21 Gregory Farnum <greg@xxxxxxxxxxx>:
> > On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang <gongfan193@xxxxxxxxx> wrote:
> >> 2012/11/20 Gregory Farnum <greg@xxxxxxxxxxx>:
> >>> On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang <gongfan193@xxxxxxxxx> wrote:
> >>>> I created a ceph cluster for test, here's mistake I made:
> >>>> Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
> >>>> removed the mds just added;
> >>>> Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy.
> >>>> As I can't repair mds.aa, so did 'ceph mds newfs metadata data
> >>>> --yes-i-really-mean-it';
> >>>
> >>> So this command is a mkfs sort of thing. It's deleted all the
> >>> "allocation tables" and filesystem metadata in favor of new, empty
> >>> ones. You should not run "--yes-i-really-mean-it" commands if you
> >>> don't know exactly what the command is doing and why you're using it.
> >>>
> >>>> mds.aa was back, but 1TB data was in cluster lost, but disk space
> >>>> still used, by 'ceps -s'.
> >>>>
> >>>> Is there any chance I can get my data back? If can't, how can I
> >>>> retrieve back the disk space.
> >>>
> >>> There's not currently a great way to get that data back. With
> >>> sufficient energy it could be re-constructed by looking through all
> >>> the RADOS objects and putting something together.
> >>> To retrieve the disk space, you'll want to delete the "data" and
> >>> "metadata" RADOS pools. This will of course *eliminate* the data you
> >>> have in your new filesystem, so grab that out first if there's
> >>> anything there you care about. Then create the pools and run the newfs
> >>> command again.
> >>> Also, you've got the syntax wrong on that newfs command. You should be
> >>> using pool IDs:
> >>> "ceph mds newfs 1 0 --yes-i-really-mean-it"
> >>> (Though these IDs may change after re-creating the pools.)
> >>> -Greg
> >>
> >> I followed your instructions, but didn't success, 'ceph mds newfs 1 0
> >> --yes-i-really-mean-it' changed nothing, do I have to delete all pools
> >> I created first? why is this way? Confused.
> >
> > If you look below at your pools, you no longer have pool IDs 0 and 1.
> > They were the old "data" and "metadata" pools that you just deleted.
> > You will need to create new pools for the filesystem and use their
> > IDs.
> >
> I did it, but didn't successed:
> log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it
> new fs with metadata pool 1 and data pool 0

Those pool #'s need to refer to pools that currently exist.

 ceph osd pool create data
 ceph osd pool create metadata
 ceph osd dump | grep ^pool

to figure out the new pool IDs, and then do the newfs command and 
substitute *those* in instead of 1 and 0.

> log3 ~ # ceph osd dump | grep ^pool
> pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num
> 320 pgp_num 320 last_change 1 owner 0
> pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins
> pg_num 8 pgp_num 8 last_change 1556 owner 0
> pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
> 8 pgp_num 8 last_change 1558 owner 0
> pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins
> pg_num 8 pgp_num 8 last_change 1561 owner 0
> 
> do I have to delete all pool [345] before recreate mds?

The other pools are ignored; no need to remove them.

Good luck!
sage

> 
> >> While testing, I found that the default pool is parent of all pools I
> >> created later, right? So, delete the default 'data' pool also deleted
> >> data belongs to other pools, is this true?
> >
> > No, absolutely not. There is no relationship between different RADOS
> > pools. If you've been using the cephfs tool to place some filesystem
> > data in different pools then your configuration is a little more
> > complicated (have you done that?), but deleting one pool is never
> > going to remove data from the others.
> > -Greg
> >
> I think that should be a bug. Here's the story I did:
> I created one directory 'audit' in running ceph filesystem, and put
> some data into the directory (about 100GB) before these commands:
> ceph osd pool create audit
> ceph mds add_data_pool 4
> cephfs /mnt/temp/audit/ set_layout -p 4
> 
> log3 ~ # ceph osd dump | grep audit
> pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
> 8 pgp_num 8 last_change 1558 owner 0
> 
> at this time, all data in audit still usable, after 'ceph osd pool
> delete data', the disk space recycled (forgot to test if the data
> still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
> thinking, the data stored before pool created won't follow the pool,
> it still follows the default pool 'data', is this a bug, or intended
> behavior?
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html