Re: cephfs ceph: fill_inode badness

"Yan, Zheng" <ukernel@xxxxxxxxx> · Fri, 4 Dec 2015 21:16:50 +0800

On Fri, Dec 4, 2015 at 10:39 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:
> i have a file which is untouchable: ls -i gives an error, stat gives an
> error. it shows ??? for all fields except name.
>
> How do i clean this up?
>

The safest way to clean this up is create a new directory, move rest
files into the new directory, move the old directory into somewhere
you don't touch, replace the old directory with the new directory.

If you still are uncomfortable with it. you can use 'rados -p metadata
rmomapkey ...'  to forcely remove the corrupted file.

first flush journal
#ceph daemon mds.nubo-2 flush journal

find inode number of the directory which contains the corrupted file

#rados -p metadata listomapkeys <dir inode number in hex>.00000000

the output should include the name (with subfix _head) of corrupted file

#rados -p metadata rmomapkey <dir inode number in hex>.00000000
<omapkey for the corrupted file>

now the file is deleted, but the directory become un-deletable. you
can fix the directory by:

make sure 'mds verify scatter' config is disable
#ceph daemon mds.nubo-2 config set mds_verify_scatter 0

fragment the directory
#ceph mds tell 0 fragment_dir <path of the un-deletable directory in
the FS>  '0/0' 1

create a file in the directory
#touch <path of the un-deletable directory>/foo

above two steps will fix directory's stat, now you can delete the directory
#rm -rf <path of the un-deletable directory>

> I'm on ubuntu 15.10, running 0.94.5
> # ceph -v
> ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>
> the node that accessed the file then caused a problem with mds:
>
> root@nubo-1:/home/git/go/src/github.com/gogits/gogs# ceph status
>     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
>      health HEALTH_WARN
>             mds0: Client nubo-1 failing to respond to capability release
>      monmap e1: 3 mons at
> {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
>             election epoch 906, quorum 0,1,2 nubo-1,nubo-2,nubo-3
>      mdsmap e418: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby
>      osdmap e2081: 6 osds: 6 up, 6 in
>       pgmap v95696: 560 pgs, 6 pools, 131 GB data, 97784 objects
>             265 GB used, 5357 GB / 5622 GB avail
>                  560 active+clean
>
> Trying a different node, i see the same problem.
>
> I'm getting this error dumped to dmesg:
>
> [670243.421212] Workqueue: ceph-msgr con_work [libceph]
> [670243.421213]  0000000000000000 00000000e800e516 ffff8810cd68f9d8
> ffffffff817e8c09
> [670243.421215]  0000000000000000 0000000000000000 ffff8810cd68fa18
> ffffffff8107b3c6
> [670243.421217]  ffff8810cd68fa28 00000000ffffffea 0000000000000000
> 0000000000000000
> [670243.421218] Call Trace:
> [670243.421221]  [<ffffffff817e8c09>] dump_stack+0x45/0x57
> [670243.421223]  [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
> [670243.421225]  [<ffffffff8107b4fa>] warn_slowpath_null+0x1a/0x20
> [670243.421229]  [<ffffffffc06ebb1c>] fill_inode.isra.18+0xc5c/0xc90 [ceph]
> [670243.421233]  [<ffffffff81217427>] ? inode_init_always+0x107/0x1b0
> [670243.421236]  [<ffffffffc06e95e0>] ? ceph_mount+0x7e0/0x7e0 [ceph]
> [670243.421241]  [<ffffffffc06ebe82>] ceph_fill_trace+0x332/0x910 [ceph]
> [670243.421248]  [<ffffffffc0709db5>] handle_reply+0x525/0xb70 [ceph]
> [670243.421255]  [<ffffffffc070cac8>] dispatch+0x3c8/0xbb0 [ceph]
> [670243.421260]  [<ffffffffc069daeb>] con_work+0x57b/0x1770 [libceph]
> [670243.421262]  [<ffffffff810b2d7b>] ? dequeue_task_fair+0x36b/0x700
> [670243.421263]  [<ffffffff810b2141>] ? put_prev_entity+0x31/0x420
> [670243.421265]  [<ffffffff81013689>] ? __switch_to+0x1f9/0x5c0
> [670243.421267]  [<ffffffff8109412a>] process_one_work+0x1aa/0x440
> [670243.421269]  [<ffffffff8109440b>] worker_thread+0x4b/0x4c0
> [670243.421271]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440
> [670243.421273]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440
> [670243.421274]  [<ffffffff8109a7c8>] kthread+0xd8/0xf0
> [670243.421276]  [<ffffffff8109a6f0>] ? kthread_create_on_node+0x1f0/0x1f0
> [670243.421277]  [<ffffffff817efe1f>] ret_from_fork+0x3f/0x70
> [670243.421279]  [<ffffffff8109a6f0>] ? kthread_create_on_node+0x1f0/0x1f0
> [670243.421280] ---[ end trace 5cded7a882dfd5d1 ]---
> [670243.421282] ceph: fill_inode badness ffff88179e2d9f28
> 10000004e91.fffffffffffffffe
>
> this problem persisted through a reboot, and there is no fsck to help me.
>
> I also tried with ceph-fuse, but it crashes when I access the file.

how did ceph-fuse crashed, please send backtrace to us.

Regards
Yan, Zheng

>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com