Re: cephfs ceph: fill_inode badness

"Yan, Zheng" <ukernel@xxxxxxxxx> · Sun, 6 Dec 2015 21:18:43 +0800

On Sun, Dec 6, 2015 at 7:01 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:
> Thanks for the advice.
>
> I dumped the filesystem contents, then deleted the cephfs, deleted the
> pools, and recreated from scratch.
>
> I did not track the specific issue in fuse, sorry. It gave an endpoint
> disconnected message. I will next time for sure.
>
> After the dump and recreate, all was good. Until... I now have a file with a
> slightly different symptom. I can stat it, but not read it:
>
> don@nubo-2:~$ cat .profile
> cat: .profile: Input/output error
> don@nubo-2:~$ stat .profile
>   File: ‘.profile’
>   Size: 675             Blocks: 2          IO Block: 4194304 regular file
> Device: 0h/0d   Inode: 1099511687525  Links: 1
> Access: (0644/-rw-r--r--)  Uid: ( 1000/     don)   Gid: ( 1000/     don)
> Access: 2015-12-04 05:08:35.247603061 +0000
> Modify: 2015-12-04 05:08:35.247603061 +0000
> Change: 2015-12-04 05:13:29.395252968 +0000
>  Birth: -
> don@nubo-2:~$ sum .profile
> sum: .profile: Input/output error
> don@nubo-2:~$ ls -il .profile
> 1099511687525 -rw-r--r-- 1 don don 675 Dec  4 05:08 .profile
>
> Would this be a similar problem? Should I give up on cephfs? its been
> working fine for me for sometime, but now 2 errors in 4 days makes me very
> nervous.

which client are you using(fuse or kernel, and version) ? do you have
inline data enabled? do you multiple data pool?

Regards
Yan, Zheng

>
>
> On 4 December 2015 at 08:16, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>
>> On Fri, Dec 4, 2015 at 10:39 AM, Don Waterloo <don.waterloo@xxxxxxxxx>
>> wrote:
>> > i have a file which is untouchable: ls -i gives an error, stat gives an
>> > error. it shows ??? for all fields except name.
>> >
>> > How do i clean this up?
>> >
>>
>> The safest way to clean this up is create a new directory, move rest
>> files into the new directory, move the old directory into somewhere
>> you don't touch, replace the old directory with the new directory.
>>
>>
>> If you still are uncomfortable with it. you can use 'rados -p metadata
>> rmomapkey ...'  to forcely remove the corrupted file.
>>
>> first flush journal
>> #ceph daemon mds.nubo-2 flush journal
>>
>> find inode number of the directory which contains the corrupted file
>>
>> #rados -p metadata listomapkeys <dir inode number in hex>.00000000
>>
>> the output should include the name (with subfix _head) of corrupted file
>>
>> #rados -p metadata rmomapkey <dir inode number in hex>.00000000
>> <omapkey for the corrupted file>
>>
>> now the file is deleted, but the directory become un-deletable. you
>> can fix the directory by:
>>
>> make sure 'mds verify scatter' config is disable
>> #ceph daemon mds.nubo-2 config set mds_verify_scatter 0
>>
>> fragment the directory
>> #ceph mds tell 0 fragment_dir <path of the un-deletable directory in
>> the FS>  '0/0' 1
>>
>> create a file in the directory
>> #touch <path of the un-deletable directory>/foo
>>
>> above two steps will fix directory's stat, now you can delete the
>> directory
>> #rm -rf <path of the un-deletable directory>
>>
>>
>> > I'm on ubuntu 15.10, running 0.94.5
>> > # ceph -v
>> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>> >
>> > the node that accessed the file then caused a problem with mds:
>> >
>> > root@nubo-1:/home/git/go/src/github.com/gogits/gogs# ceph status
>> >     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
>> >      health HEALTH_WARN
>> >             mds0: Client nubo-1 failing to respond to capability release
>> >      monmap e1: 3 mons at
>> >
>> > {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
>> >             election epoch 906, quorum 0,1,2 nubo-1,nubo-2,nubo-3
>> >      mdsmap e418: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby
>> >      osdmap e2081: 6 osds: 6 up, 6 in
>> >       pgmap v95696: 560 pgs, 6 pools, 131 GB data, 97784 objects
>> >             265 GB used, 5357 GB / 5622 GB avail
>> >                  560 active+clean
>> >
>> > Trying a different node, i see the same problem.
>> >
>> > I'm getting this error dumped to dmesg:
>> >
>> > [670243.421212] Workqueue: ceph-msgr con_work [libceph]
>> > [670243.421213]  0000000000000000 00000000e800e516 ffff8810cd68f9d8
>> > ffffffff817e8c09
>> > [670243.421215]  0000000000000000 0000000000000000 ffff8810cd68fa18
>> > ffffffff8107b3c6
>> > [670243.421217]  ffff8810cd68fa28 00000000ffffffea 0000000000000000
>> > 0000000000000000
>> > [670243.421218] Call Trace:
>> > [670243.421221]  [<ffffffff817e8c09>] dump_stack+0x45/0x57
>> > [670243.421223]  [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
>> > [670243.421225]  [<ffffffff8107b4fa>] warn_slowpath_null+0x1a/0x20
>> > [670243.421229]  [<ffffffffc06ebb1c>] fill_inode.isra.18+0xc5c/0xc90
>> > [ceph]
>> > [670243.421233]  [<ffffffff81217427>] ? inode_init_always+0x107/0x1b0
>> > [670243.421236]  [<ffffffffc06e95e0>] ? ceph_mount+0x7e0/0x7e0 [ceph]
>> > [670243.421241]  [<ffffffffc06ebe82>] ceph_fill_trace+0x332/0x910 [ceph]
>> > [670243.421248]  [<ffffffffc0709db5>] handle_reply+0x525/0xb70 [ceph]
>> > [670243.421255]  [<ffffffffc070cac8>] dispatch+0x3c8/0xbb0 [ceph]
>> > [670243.421260]  [<ffffffffc069daeb>] con_work+0x57b/0x1770 [libceph]
>> > [670243.421262]  [<ffffffff810b2d7b>] ? dequeue_task_fair+0x36b/0x700
>> > [670243.421263]  [<ffffffff810b2141>] ? put_prev_entity+0x31/0x420
>> > [670243.421265]  [<ffffffff81013689>] ? __switch_to+0x1f9/0x5c0
>> > [670243.421267]  [<ffffffff8109412a>] process_one_work+0x1aa/0x440
>> > [670243.421269]  [<ffffffff8109440b>] worker_thread+0x4b/0x4c0
>> > [670243.421271]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440
>> > [670243.421273]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440
>> > [670243.421274]  [<ffffffff8109a7c8>] kthread+0xd8/0xf0
>> > [670243.421276]  [<ffffffff8109a6f0>] ?
>> > kthread_create_on_node+0x1f0/0x1f0
>> > [670243.421277]  [<ffffffff817efe1f>] ret_from_fork+0x3f/0x70
>> > [670243.421279]  [<ffffffff8109a6f0>] ?
>> > kthread_create_on_node+0x1f0/0x1f0
>> > [670243.421280] ---[ end trace 5cded7a882dfd5d1 ]---
>> > [670243.421282] ceph: fill_inode badness ffff88179e2d9f28
>> > 10000004e91.fffffffffffffffe
>> >
>> > this problem persisted through a reboot, and there is no fsck to help
>> > me.
>> >
>> > I also tried with ceph-fuse, but it crashes when I access the file.
>>
>> how did ceph-fuse crashed, please send backtrace to us.
>>
>> Regards
>> Yan, Zheng
>>
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com