Re: Cannot mount CephFS after irreversible OSD lost

John Spray <jspray@xxxxxxxxxx> · Thu, 19 Nov 2015 10:10:10 +0000

On Thu, Nov 19, 2015 at 10:07 AM, Mykola Dvornik
<mykola.dvornik@xxxxxxxxx> wrote:
> I'm guessing in this context that "write data" possibly means creating
> a file (as opposed to writing to an existing file).
>
> Indeed. Sorry for the confusion.
>
> You've pretty much hit the limits of what the disaster recovery tools
> are currently capable of.  What I'd recommend you do at this stage is
> mount your filesystem read-only, back it up, and then create a new
> filesystem and restore from backup.
>
> Ok. Is it somehow possible to have multiple FSs on the same ceph cluster?

No, we want to do this but it's not there yet.  Your scenario is one
of the motivations :-)

(for the record multi-fs branch is
https://github.com/jcsp/ceph/commits/wip-multi-filesystems, which
works, but we'll probably go back and re-do the mon side of it before
finishing)

John

>
>
> On 19 November 2015 at 10:43, John Spray <jspray@xxxxxxxxxx> wrote:
>>
>> On Wed, Nov 18, 2015 at 9:21 AM, Mykola Dvornik
>> <mykola.dvornik@xxxxxxxxx> wrote:
>> > Hi John,
>> >
>> > It turned out that mds triggers an assertion
>> >
>> > mds/MDCache.cc: 269: FAILED assert(inode_map.count(in->vino()) == 0)
>> >
>> > on any attempt to write data to the filesystem mounted via fuse.
>>
>> I'm guessing in this context that "write data" possibly means creating
>> a file (as opposed to writing to an existing file).
>>
>> Currently, cephfs-data-scan injects inodes well enough that you can
>> read them, but it's not updating the inode table to reflect that the
>> recovered inodes are in use.  As a result, when new files are created
>> they are probably trying to take inode numbers that are already in use
>> (by the recovered files), and as a result you're hitting this
>> assertion.  The ticket for updating the inotable after injection of
>> recovered inodes is http://tracker.ceph.com/issues/12131
>>
>> > Deleting data is still OK.
>> >
>> > I cannot really follow why duplicated inodes appear.
>> >
>> > Are there any ways to flush/reset the MDS cache?
>>
>> You've pretty much hit the limits of what the disaster recovery tools
>> are currently capable of.  What I'd recommend you do at this stage is
>> mount your filesystem read-only, back it up, and then create a new
>> filesystem and restore from backup.
>>
>> I'm writing a patch to handle the particular case where someone needs
>> to update their inode table to mark all inodes as used up to some
>> maximum, but the chances are that after that you'll still run into
>> some other issue, until we've finished the tools to make it all the
>> way through this path.
>>
>> John
>>
>> >
>> >
>> >
>> > On 17 November 2015 at 13:26, John Spray <jspray@xxxxxxxxxx> wrote:
>> >>
>> >> On Tue, Nov 17, 2015 at 12:17 PM, Mykola Dvornik
>> >> <mykola.dvornik@xxxxxxxxx> wrote:
>> >> > Dear John,
>> >> >
>> >> > Thanks for such a prompt reply!
>> >> >
>> >> > Seems like something happens on the mon side, since there are no
>> >> > mount-specific requests logged on the mds side (see below).
>> >> > FYI, some hours ago I've disabled auth completely, but it didn't
>> >> > help.
>> >> >
>> >> > The serialized metadata pool is 9.7G. I can try to compress it with
>> >> > 7z,
>> >> > then
>> >> > setup rssh account for you to scp/rsync it.
>> >> >
>> >> > debug mds = 20
>> >> > debug mon = 20
>> >>
>> >> Don't worry about the mon logs.  That MDS log snippet appears to be
>> >> from several minutes earlier than the client's attempt to mount.
>> >>
>> >> In these cases it's generally simpler if you truncate all the logs,
>> >> then attempt the mount, then send all the logs in full rather than
>> >> snippets, so that we can be sure nothing is missing.
>> >>
>> >> Please also get the client log (use the fuse client with
>> >> --debug-client=20).
>> >>
>> >> John
>> >
>> >
>> >
>> >
>> > --
>> >  Mykola
>
>
>
>
> --
>  Mykola
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com