Re: cephfs ceph: fill_inode badness

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the advice.

I dumped the filesystem contents, then deleted the cephfs, deleted the pools, and recreated from scratch.

I did not track the specific issue in fuse, sorry. It gave an endpoint disconnected message. I will next time for sure.

After the dump and recreate, all was good. Until... I now have a file with a slightly different symptom. I can stat it, but not read it:

don@nubo-2:~$ cat .profile
cat: .profile: Input/output error
don@nubo-2:~$ stat .profile
  File: ‘.profile’
  Size: 675             Blocks: 2          IO Block: 4194304 regular file
Device: 0h/0d   Inode: 1099511687525  Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/     don)   Gid: ( 1000/     don)
Access: 2015-12-04 05:08:35.247603061 +0000
Modify: 2015-12-04 05:08:35.247603061 +0000
Change: 2015-12-04 05:13:29.395252968 +0000
 Birth: -
don@nubo-2:~$ sum .profile
sum: .profile: Input/output error
don@nubo-2:~$ ls -il .profile 
1099511687525 -rw-r--r-- 1 don don 675 Dec  4 05:08 .profile

Would this be a similar problem? Should I give up on cephfs? its been working fine for me for sometime, but now 2 errors in 4 days makes me very nervous.


On 4 December 2015 at 08:16, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Fri, Dec 4, 2015 at 10:39 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:
> i have a file which is untouchable: ls -i gives an error, stat gives an
> error. it shows ??? for all fields except name.
>
> How do i clean this up?
>

The safest way to clean this up is create a new directory, move rest
files into the new directory, move the old directory into somewhere
you don't touch, replace the old directory with the new directory.


If you still are uncomfortable with it. you can use 'rados -p metadata
rmomapkey ...'  to forcely remove the corrupted file.

first flush journal
#ceph daemon mds.nubo-2 flush journal

find inode number of the directory which contains the corrupted file

#rados -p metadata listomapkeys <dir inode number in hex>.00000000

the output should include the name (with subfix _head) of corrupted file

#rados -p metadata rmomapkey <dir inode number in hex>.00000000
<omapkey for the corrupted file>

now the file is deleted, but the directory become un-deletable. you
can fix the directory by:

make sure 'mds verify scatter' config is disable
#ceph daemon mds.nubo-2 config set mds_verify_scatter 0

fragment the directory
#ceph mds tell 0 fragment_dir <path of the un-deletable directory in
the FS>  '0/0' 1

create a file in the directory
#touch <path of the un-deletable directory>/foo

above two steps will fix directory's stat, now you can delete the directory
#rm -rf <path of the un-deletable directory>


> I'm on ubuntu 15.10, running 0.94.5
> # ceph -v
> ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
>
> the node that accessed the file then caused a problem with mds:
>
> root@nubo-1:/home/git/go/src/github.com/gogits/gogs# ceph status
>     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
>      health HEALTH_WARN
>             mds0: Client nubo-1 failing to respond to capability release
>      monmap e1: 3 mons at
> {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
>             election epoch 906, quorum 0,1,2 nubo-1,nubo-2,nubo-3
>      mdsmap e418: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby
>      osdmap e2081: 6 osds: 6 up, 6 in
>       pgmap v95696: 560 pgs, 6 pools, 131 GB data, 97784 objects
>             265 GB used, 5357 GB / 5622 GB avail
>                  560 active+clean
>
> Trying a different node, i see the same problem.
>
> I'm getting this error dumped to dmesg:
>
> [670243.421212] Workqueue: ceph-msgr con_work [libceph]
> [670243.421213]  0000000000000000 00000000e800e516 ffff8810cd68f9d8
> ffffffff817e8c09
> [670243.421215]  0000000000000000 0000000000000000 ffff8810cd68fa18
> ffffffff8107b3c6
> [670243.421217]  ffff8810cd68fa28 00000000ffffffea 0000000000000000
> 0000000000000000
> [670243.421218] Call Trace:
> [670243.421221]  [<ffffffff817e8c09>] dump_stack+0x45/0x57
> [670243.421223]  [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0
> [670243.421225]  [<ffffffff8107b4fa>] warn_slowpath_null+0x1a/0x20
> [670243.421229]  [<ffffffffc06ebb1c>] fill_inode.isra.18+0xc5c/0xc90 [ceph]
> [670243.421233]  [<ffffffff81217427>] ? inode_init_always+0x107/0x1b0
> [670243.421236]  [<ffffffffc06e95e0>] ? ceph_mount+0x7e0/0x7e0 [ceph]
> [670243.421241]  [<ffffffffc06ebe82>] ceph_fill_trace+0x332/0x910 [ceph]
> [670243.421248]  [<ffffffffc0709db5>] handle_reply+0x525/0xb70 [ceph]
> [670243.421255]  [<ffffffffc070cac8>] dispatch+0x3c8/0xbb0 [ceph]
> [670243.421260]  [<ffffffffc069daeb>] con_work+0x57b/0x1770 [libceph]
> [670243.421262]  [<ffffffff810b2d7b>] ? dequeue_task_fair+0x36b/0x700
> [670243.421263]  [<ffffffff810b2141>] ? put_prev_entity+0x31/0x420
> [670243.421265]  [<ffffffff81013689>] ? __switch_to+0x1f9/0x5c0
> [670243.421267]  [<ffffffff8109412a>] process_one_work+0x1aa/0x440
> [670243.421269]  [<ffffffff8109440b>] worker_thread+0x4b/0x4c0
> [670243.421271]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440
> [670243.421273]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440
> [670243.421274]  [<ffffffff8109a7c8>] kthread+0xd8/0xf0
> [670243.421276]  [<ffffffff8109a6f0>] ? kthread_create_on_node+0x1f0/0x1f0
> [670243.421277]  [<ffffffff817efe1f>] ret_from_fork+0x3f/0x70
> [670243.421279]  [<ffffffff8109a6f0>] ? kthread_create_on_node+0x1f0/0x1f0
> [670243.421280] ---[ end trace 5cded7a882dfd5d1 ]---
> [670243.421282] ceph: fill_inode badness ffff88179e2d9f28
> 10000004e91.fffffffffffffffe
>
> this problem persisted through a reboot, and there is no fsck to help me.
>
> I also tried with ceph-fuse, but it crashes when I access the file.

how did ceph-fuse crashed, please send backtrace to us.

Regards
Yan, Zheng

>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux