Re: Files lost after mds rebuild

Drunkard Zhang <gongfan193@xxxxxxxxx> · Wed, 21 Nov 2012 12:28:03 +0800

'

2012/11/21 Gregory Farnum <greg@xxxxxxxxxxx>:
> On Tue, Nov 20, 2012 at 1:25 AM, Drunkard Zhang <gongfan193@xxxxxxxxx> wrote:
>> 2012/11/20 Gregory Farnum <greg@xxxxxxxxxxx>:
>>> On Mon, Nov 19, 2012 at 7:55 AM, Drunkard Zhang <gongfan193@xxxxxxxxx> wrote:
>>>> I created a ceph cluster for test, here's mistake I made:
>>>> Add a second mds: mds.ab, executed 'ceph mds set_max_mds 2', then
>>>> removed the mds just added;
>>>> Then 'ceph mds set_max_mds 1', the first mds.aa crashed, and became laggy.
>>>> As I can't repair mds.aa, so did 'ceph mds newfs metadata data
>>>> --yes-i-really-mean-it';
>>>
>>> So this command is a mkfs sort of thing. It's deleted all the
>>> "allocation tables" and filesystem metadata in favor of new, empty
>>> ones. You should not run "--yes-i-really-mean-it" commands if you
>>> don't know exactly what the command is doing and why you're using it.
>>>
>>>> mds.aa was back, but 1TB data was in cluster lost, but disk space
>>>> still used, by 'ceps -s'.
>>>>
>>>> Is there any chance I can get my data back? If can't, how can I
>>>> retrieve back the disk space.
>>>
>>> There's not currently a great way to get that data back. With
>>> sufficient energy it could be re-constructed by looking through all
>>> the RADOS objects and putting something together.
>>> To retrieve the disk space, you'll want to delete the "data" and
>>> "metadata" RADOS pools. This will of course *eliminate* the data you
>>> have in your new filesystem, so grab that out first if there's
>>> anything there you care about. Then create the pools and run the newfs
>>> command again.
>>> Also, you've got the syntax wrong on that newfs command. You should be
>>> using pool IDs:
>>> "ceph mds newfs 1 0 --yes-i-really-mean-it"
>>> (Though these IDs may change after re-creating the pools.)
>>> -Greg
>>
>> I followed your instructions, but didn't success, 'ceph mds newfs 1 0
>> --yes-i-really-mean-it' changed nothing, do I have to delete all pools
>> I created first? why is this way? Confused.
>
> If you look below at your pools, you no longer have pool IDs 0 and 1.
> They were the old "data" and "metadata" pools that you just deleted.
> You will need to create new pools for the filesystem and use their
> IDs.
>
I did it, but didn't successed:
log3 ~ # ceph mds newfs 1 0 --yes-i-really-mean-it
new fs with metadata pool 1 and data pool 0
log3 ~ # ceph osd dump | grep ^pool
pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num
320 pgp_num 320 last_change 1 owner 0
pool 3 'netflow' rep size 2 crush_ruleset 0 object_hash rjenkins
pg_num 8 pgp_num 8 last_change 1556 owner 0
pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
8 pgp_num 8 last_change 1558 owner 0
pool 5 'dns-trend' rep size 2 crush_ruleset 0 object_hash rjenkins
pg_num 8 pgp_num 8 last_change 1561 owner 0

do I have to delete all pool [345] before recreate mds?

>> While testing, I found that the default pool is parent of all pools I
>> created later, right? So, delete the default 'data' pool also deleted
>> data belongs to other pools, is this true?
>
> No, absolutely not. There is no relationship between different RADOS
> pools. If you've been using the cephfs tool to place some filesystem
> data in different pools then your configuration is a little more
> complicated (have you done that?), but deleting one pool is never
> going to remove data from the others.
> -Greg
>
I think that should be a bug. Here's the story I did:
I created one directory 'audit' in running ceph filesystem, and put
some data into the directory (about 100GB) before these commands:
ceph osd pool create audit
ceph mds add_data_pool 4
cephfs /mnt/temp/audit/ set_layout -p 4

log3 ~ # ceph osd dump | grep audit
pool 4 'audit' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
8 pgp_num 8 last_change 1558 owner 0

at this time, all data in audit still usable, after 'ceph osd pool
delete data', the disk space recycled (forgot to test if the data
still usable), only 200MB used, from 'ceph -s'. So, here's what I'm
thinking, the data stored before pool created won't follow the pool,
it still follows the default pool 'data', is this a bug, or intended
behavior?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html