On Sun, Dec 6, 2015 at 7:01 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote: > Thanks for the advice. > > I dumped the filesystem contents, then deleted the cephfs, deleted the > pools, and recreated from scratch. > > I did not track the specific issue in fuse, sorry. It gave an endpoint > disconnected message. I will next time for sure. > > After the dump and recreate, all was good. Until... I now have a file with a > slightly different symptom. I can stat it, but not read it: > > don@nubo-2:~$ cat .profile > cat: .profile: Input/output error > don@nubo-2:~$ stat .profile > File: ‘.profile’ > Size: 675 Blocks: 2 IO Block: 4194304 regular file > Device: 0h/0d Inode: 1099511687525 Links: 1 > Access: (0644/-rw-r--r--) Uid: ( 1000/ don) Gid: ( 1000/ don) > Access: 2015-12-04 05:08:35.247603061 +0000 > Modify: 2015-12-04 05:08:35.247603061 +0000 > Change: 2015-12-04 05:13:29.395252968 +0000 > Birth: - > don@nubo-2:~$ sum .profile > sum: .profile: Input/output error > don@nubo-2:~$ ls -il .profile > 1099511687525 -rw-r--r-- 1 don don 675 Dec 4 05:08 .profile > > Would this be a similar problem? Should I give up on cephfs? its been > working fine for me for sometime, but now 2 errors in 4 days makes me very > nervous. which client are you using(fuse or kernel, and version) ? do you have inline data enabled? do you multiple data pool? Regards Yan, Zheng > > > On 4 December 2015 at 08:16, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >> >> On Fri, Dec 4, 2015 at 10:39 AM, Don Waterloo <don.waterloo@xxxxxxxxx> >> wrote: >> > i have a file which is untouchable: ls -i gives an error, stat gives an >> > error. it shows ??? for all fields except name. >> > >> > How do i clean this up? >> > >> >> The safest way to clean this up is create a new directory, move rest >> files into the new directory, move the old directory into somewhere >> you don't touch, replace the old directory with the new directory. >> >> >> If you still are uncomfortable with it. you can use 'rados -p metadata >> rmomapkey ...' to forcely remove the corrupted file. >> >> first flush journal >> #ceph daemon mds.nubo-2 flush journal >> >> find inode number of the directory which contains the corrupted file >> >> #rados -p metadata listomapkeys <dir inode number in hex>.00000000 >> >> the output should include the name (with subfix _head) of corrupted file >> >> #rados -p metadata rmomapkey <dir inode number in hex>.00000000 >> <omapkey for the corrupted file> >> >> now the file is deleted, but the directory become un-deletable. you >> can fix the directory by: >> >> make sure 'mds verify scatter' config is disable >> #ceph daemon mds.nubo-2 config set mds_verify_scatter 0 >> >> fragment the directory >> #ceph mds tell 0 fragment_dir <path of the un-deletable directory in >> the FS> '0/0' 1 >> >> create a file in the directory >> #touch <path of the un-deletable directory>/foo >> >> above two steps will fix directory's stat, now you can delete the >> directory >> #rm -rf <path of the un-deletable directory> >> >> >> > I'm on ubuntu 15.10, running 0.94.5 >> > # ceph -v >> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43) >> > >> > the node that accessed the file then caused a problem with mds: >> > >> > root@nubo-1:/home/git/go/src/github.com/gogits/gogs# ceph status >> > cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded >> > health HEALTH_WARN >> > mds0: Client nubo-1 failing to respond to capability release >> > monmap e1: 3 mons at >> > >> > {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0} >> > election epoch 906, quorum 0,1,2 nubo-1,nubo-2,nubo-3 >> > mdsmap e418: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby >> > osdmap e2081: 6 osds: 6 up, 6 in >> > pgmap v95696: 560 pgs, 6 pools, 131 GB data, 97784 objects >> > 265 GB used, 5357 GB / 5622 GB avail >> > 560 active+clean >> > >> > Trying a different node, i see the same problem. >> > >> > I'm getting this error dumped to dmesg: >> > >> > [670243.421212] Workqueue: ceph-msgr con_work [libceph] >> > [670243.421213] 0000000000000000 00000000e800e516 ffff8810cd68f9d8 >> > ffffffff817e8c09 >> > [670243.421215] 0000000000000000 0000000000000000 ffff8810cd68fa18 >> > ffffffff8107b3c6 >> > [670243.421217] ffff8810cd68fa28 00000000ffffffea 0000000000000000 >> > 0000000000000000 >> > [670243.421218] Call Trace: >> > [670243.421221] [<ffffffff817e8c09>] dump_stack+0x45/0x57 >> > [670243.421223] [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0 >> > [670243.421225] [<ffffffff8107b4fa>] warn_slowpath_null+0x1a/0x20 >> > [670243.421229] [<ffffffffc06ebb1c>] fill_inode.isra.18+0xc5c/0xc90 >> > [ceph] >> > [670243.421233] [<ffffffff81217427>] ? inode_init_always+0x107/0x1b0 >> > [670243.421236] [<ffffffffc06e95e0>] ? ceph_mount+0x7e0/0x7e0 [ceph] >> > [670243.421241] [<ffffffffc06ebe82>] ceph_fill_trace+0x332/0x910 [ceph] >> > [670243.421248] [<ffffffffc0709db5>] handle_reply+0x525/0xb70 [ceph] >> > [670243.421255] [<ffffffffc070cac8>] dispatch+0x3c8/0xbb0 [ceph] >> > [670243.421260] [<ffffffffc069daeb>] con_work+0x57b/0x1770 [libceph] >> > [670243.421262] [<ffffffff810b2d7b>] ? dequeue_task_fair+0x36b/0x700 >> > [670243.421263] [<ffffffff810b2141>] ? put_prev_entity+0x31/0x420 >> > [670243.421265] [<ffffffff81013689>] ? __switch_to+0x1f9/0x5c0 >> > [670243.421267] [<ffffffff8109412a>] process_one_work+0x1aa/0x440 >> > [670243.421269] [<ffffffff8109440b>] worker_thread+0x4b/0x4c0 >> > [670243.421271] [<ffffffff810943c0>] ? process_one_work+0x440/0x440 >> > [670243.421273] [<ffffffff810943c0>] ? process_one_work+0x440/0x440 >> > [670243.421274] [<ffffffff8109a7c8>] kthread+0xd8/0xf0 >> > [670243.421276] [<ffffffff8109a6f0>] ? >> > kthread_create_on_node+0x1f0/0x1f0 >> > [670243.421277] [<ffffffff817efe1f>] ret_from_fork+0x3f/0x70 >> > [670243.421279] [<ffffffff8109a6f0>] ? >> > kthread_create_on_node+0x1f0/0x1f0 >> > [670243.421280] ---[ end trace 5cded7a882dfd5d1 ]--- >> > [670243.421282] ceph: fill_inode badness ffff88179e2d9f28 >> > 10000004e91.fffffffffffffffe >> > >> > this problem persisted through a reboot, and there is no fsck to help >> > me. >> > >> > I also tried with ceph-fuse, but it crashes when I access the file. >> >> how did ceph-fuse crashed, please send backtrace to us. >> >> Regards >> Yan, Zheng >> >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com