Re: cephfs ceph: fill_inode badness

Don Waterloo <don.waterloo@xxxxxxxxx> · Sun, 6 Dec 2015 11:52:12 -0500

kernel driver. One node is 4.3 kernel (ubuntu wily mainline) and one is 4.2 kernel (ubuntu wily stock)
I don't believe inline data is enabled (nothiing in ceph.conf, nothing in fstab).

Its mounted like this:

10.100.10.60,10.100.10.61,10.100.10.62:/ /cephfs ceph _netdev,noauto,noatime,x-systemd.requires=network-online.target,x-systemd.automount,x-systemd.device-timeout=10,name=admin,secret=XXX 0 2

I'm not sure what multiple data pool would mean? I have one metadata, and one data pool for the cephfs, and then other ceph pools for openstack cinder and one i tried w/ docker registry that didn't work and i backed out.

~$ ceph osd lspools
0 rbd,1 mypool,4 cinder-volumes,5 docker,12 cephfs_metadata,13 cephfs_data,

In this last case, one node was unable to read that file (.profile), but the other node that had it mounted was. A reboot of the affected node returned access to the file. In my previous case, no node was able to read the affected file, and stat failed on it (where here stat did not fail but read did).

ceph is 0.94.5-0ubuntu0.15.10.1

~$ ceph status
    cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
     health HEALTH_OK
     monmap e1: 3 mons at {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
            election epoch 970, quorum 0,1,2 nubo-1,nubo-2,nubo-3
     mdsmap e537: 1/1/1 up {0=nubo-1=up:active}, 2 up:standby
     osdmap e2266: 6 osds: 6 up, 6 in
      pgmap v99487: 840 pgs, 6 pools, 131 GB data, 101916 objects
            265 GB used, 5357 GB / 5622 GB avail
                 840 active+clean

On 6 December 2015 at 08:18, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Sun, Dec 6, 2015 at 7:01 AM, Don Waterloo <don.waterloo@xxxxxxxxx> wrote:

> Thanks for the advice.

>

> I dumped the filesystem contents, then deleted the cephfs, deleted the

> pools, and recreated from scratch.

>

> I did not track the specific issue in fuse, sorry. It gave an endpoint

> disconnected message. I will next time for sure.

>

> After the dump and recreate, all was good. Until... I now have a file with a

> slightly different symptom. I can stat it, but not read it:

>

> don@nubo-2:~$ cat .profile

> cat: .profile: Input/output error

> don@nubo-2:~$ stat .profile

>   File: ‘.profile’

>   Size: 675             Blocks: 2          IO Block: 4194304 regular file

> Device: 0h/0d   Inode: 1099511687525  Links: 1

> Access: (0644/-rw-r--r--)  Uid: ( 1000/     don)   Gid: ( 1000/     don)

> Access: 2015-12-04 05:08:35.247603061 +0000

> Modify: 2015-12-04 05:08:35.247603061 +0000

> Change: 2015-12-04 05:13:29.395252968 +0000

>  Birth: -

> don@nubo-2:~$ sum .profile

> sum: .profile: Input/output error

> don@nubo-2:~$ ls -il .profile

> 1099511687525 -rw-r--r-- 1 don don 675 Dec  4 05:08 .profile

>

> Would this be a similar problem? Should I give up on cephfs? its been

> working fine for me for sometime, but now 2 errors in 4 days makes me very

> nervous.

which client are you using(fuse or kernel, and version) ? do you have

inline data enabled? do you multiple data pool?

Regards

Yan, Zheng

>

>

> On 4 December 2015 at 08:16, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

>>

>> On Fri, Dec 4, 2015 at 10:39 AM, Don Waterloo <don.waterloo@xxxxxxxxx>

>> wrote:

>> > i have a file which is untouchable: ls -i gives an error, stat gives an

>> > error. it shows ??? for all fields except name.

>> >

>> > How do i clean this up?

>> >

>>

>> The safest way to clean this up is create a new directory, move rest

>> files into the new directory, move the old directory into somewhere

>> you don't touch, replace the old directory with the new directory.

>>

>>

>> If you still are uncomfortable with it. you can use 'rados -p metadata

>> rmomapkey ...'  to forcely remove the corrupted file.

>>

>> first flush journal

>> #ceph daemon mds.nubo-2 flush journal

>>

>> find inode number of the directory which contains the corrupted file

>>

>> #rados -p metadata listomapkeys <dir inode number in hex>.00000000

>>

>> the output should include the name (with subfix _head) of corrupted file

>>

>> #rados -p metadata rmomapkey <dir inode number in hex>.00000000

>> <omapkey for the corrupted file>

>>

>> now the file is deleted, but the directory become un-deletable. you

>> can fix the directory by:

>>

>> make sure 'mds verify scatter' config is disable

>> #ceph daemon mds.nubo-2 config set mds_verify_scatter 0

>>

>> fragment the directory

>> #ceph mds tell 0 fragment_dir <path of the un-deletable directory in

>> the FS>  '0/0' 1

>>

>> create a file in the directory

>> #touch <path of the un-deletable directory>/foo

>>

>> above two steps will fix directory's stat, now you can delete the

>> directory

>> #rm -rf <path of the un-deletable directory>

>>

>>

>> > I'm on ubuntu 15.10, running 0.94.5

>> > # ceph -v

>> > ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)

>> >

>> > the node that accessed the file then caused a problem with mds:

>> >

>> > root@nubo-1:/home/git/go/src/github.com/gogits/gogs# ceph status

>> >     cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded

>> >      health HEALTH_WARN

>> >             mds0: Client nubo-1 failing to respond to capability release

>> >      monmap e1: 3 mons at

>> >

>> > {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}

>> >             election epoch 906, quorum 0,1,2 nubo-1,nubo-2,nubo-3

>> >      mdsmap e418: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby

>> >      osdmap e2081: 6 osds: 6 up, 6 in

>> >       pgmap v95696: 560 pgs, 6 pools, 131 GB data, 97784 objects

>> >             265 GB used, 5357 GB / 5622 GB avail

>> >                  560 active+clean

>> >

>> > Trying a different node, i see the same problem.

>> >

>> > I'm getting this error dumped to dmesg:

>> >

>> > [670243.421212] Workqueue: ceph-msgr con_work [libceph]

>> > [670243.421213]  0000000000000000 00000000e800e516 ffff8810cd68f9d8

>> > ffffffff817e8c09

>> > [670243.421215]  0000000000000000 0000000000000000 ffff8810cd68fa18

>> > ffffffff8107b3c6

>> > [670243.421217]  ffff8810cd68fa28 00000000ffffffea 0000000000000000

>> > 0000000000000000

>> > [670243.421218] Call Trace:

>> > [670243.421221]  [<ffffffff817e8c09>] dump_stack+0x45/0x57

>> > [670243.421223]  [<ffffffff8107b3c6>] warn_slowpath_common+0x86/0xc0

>> > [670243.421225]  [<ffffffff8107b4fa>] warn_slowpath_null+0x1a/0x20

>> > [670243.421229]  [<ffffffffc06ebb1c>] fill_inode.isra.18+0xc5c/0xc90

>> > [ceph]

>> > [670243.421233]  [<ffffffff81217427>] ? inode_init_always+0x107/0x1b0

>> > [670243.421236]  [<ffffffffc06e95e0>] ? ceph_mount+0x7e0/0x7e0 [ceph]

>> > [670243.421241]  [<ffffffffc06ebe82>] ceph_fill_trace+0x332/0x910 [ceph]

>> > [670243.421248]  [<ffffffffc0709db5>] handle_reply+0x525/0xb70 [ceph]

>> > [670243.421255]  [<ffffffffc070cac8>] dispatch+0x3c8/0xbb0 [ceph]

>> > [670243.421260]  [<ffffffffc069daeb>] con_work+0x57b/0x1770 [libceph]

>> > [670243.421262]  [<ffffffff810b2d7b>] ? dequeue_task_fair+0x36b/0x700

>> > [670243.421263]  [<ffffffff810b2141>] ? put_prev_entity+0x31/0x420

>> > [670243.421265]  [<ffffffff81013689>] ? __switch_to+0x1f9/0x5c0

>> > [670243.421267]  [<ffffffff8109412a>] process_one_work+0x1aa/0x440

>> > [670243.421269]  [<ffffffff8109440b>] worker_thread+0x4b/0x4c0

>> > [670243.421271]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440

>> > [670243.421273]  [<ffffffff810943c0>] ? process_one_work+0x440/0x440

>> > [670243.421274]  [<ffffffff8109a7c8>] kthread+0xd8/0xf0

>> > [670243.421276]  [<ffffffff8109a6f0>] ?

>> > kthread_create_on_node+0x1f0/0x1f0

>> > [670243.421277]  [<ffffffff817efe1f>] ret_from_fork+0x3f/0x70

>> > [670243.421279]  [<ffffffff8109a6f0>] ?

>> > kthread_create_on_node+0x1f0/0x1f0

>> > [670243.421280] ---[ end trace 5cded7a882dfd5d1 ]---

>> > [670243.421282] ceph: fill_inode badness ffff88179e2d9f28

>> > 10000004e91.fffffffffffffffe

>> >

>> > this problem persisted through a reboot, and there is no fsck to help

>> > me.

>> >

>> > I also tried with ceph-fuse, but it crashes when I access the file.

>>

>> how did ceph-fuse crashed, please send backtrace to us.

>>

>> Regards

>> Yan, Zheng

>>

>> >

>> > _______________________________________________

>> > ceph-users mailing list

>> > ceph-users@xxxxxxxxxxxxxx

>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>> >

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com