Re: CephFS Bug found with CentOS 7.2

Adam Tygart <mozes@xxxxxxx> · Thu, 16 Jun 2016 16:18:21 -0500

This sounds an awful lot like a a bug I've run into a few times (not
often enough to get a good backtrace out of the kernel or mds)
involving vim on a symlink to a file in another directory. It will
occasionally corrupt the symlink in such a way that the symlink is
unreadable. Filling dmesg with:

[ 2368.036667] ceph: fill_inode badness on ffff8800bb5fb610
[ 2368.969657] ------------[ cut here ]------------
[ 2368.969670] WARNING: CPU: 0 PID: 15 at fs/ceph/inode.c:813
fill_inode.isra.19+0x4b1/0xa49()
[ 2368.969672] Modules linked in:
[ 2368.969684] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G        W
   4.5.0-gentoo #1
[ 2368.969686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 2368.969693] Workqueue: ceph-msgr ceph_con_workfn
[ 2368.969695]  0000000000000286 000000007000a7b9 ffff88017e267af0
ffffffffb142ec39
[ 2368.969698]  0000000000000000 0000000000000009 ffff88017e267b28
ffffffffb1091c83
[ 2368.969700]  ffffffffb13be512 ffffc900020da8cd ffff880427a30230
ffffffffffffffff
[ 2368.969704] Call Trace:
[ 2368.969709]  [<ffffffffb142ec39>] dump_stack+0x63/0x7f
[ 2368.969714]  [<ffffffffb1091c83>] warn_slowpath_common+0x9a/0xb3
[ 2368.969717]  [<ffffffffb13be512>] ? fill_inode.isra.19+0x4b1/0xa49
[ 2368.969719]  [<ffffffffb1091d86>] warn_slowpath_null+0x15/0x17
[ 2368.969722]  [<ffffffffb13be512>] fill_inode.isra.19+0x4b1/0xa49
[ 2368.969724]  [<ffffffffb13bca00>] ? ceph_mount+0x729/0x72e
[ 2368.969727]  [<ffffffffb13bf705>] ceph_readdir_prepopulate+0x48f/0x70c
[ 2368.969730]  [<ffffffffb13daac3>] dispatch+0xebf/0x1428
[ 2368.969752]  [<ffffffffb19098f2>] ? ceph_x_check_message_signature+0x42/0xc4
[ 2368.969756]  [<ffffffffb18fa16e>] ceph_con_workfn+0xe1a/0x24f3
[ 2368.969759]  [<ffffffffb104603a>] ? load_TLS+0xb/0xf
[ 2368.969761]  [<ffffffffb10468f9>] ? __switch_to+0x3b0/0x42b
[ 2368.969765]  [<ffffffffb10afd8f>] ? finish_task_switch+0xff/0x191
[ 2368.969768]  [<ffffffffb10a53b3>] process_one_work+0x175/0x2a0
[ 2368.969770]  [<ffffffffb10a59c8>] worker_thread+0x1fc/0x2ae
[ 2368.969772]  [<ffffffffb10a57cc>] ? rescuer_thread+0x2c0/0x2c0
[ 2368.969775]  [<ffffffffb10a9c4b>] kthread+0xaf/0xb7
[ 2368.969777]  [<ffffffffb10a9b9c>] ? kthread_parkme+0x1f/0x1f
[ 2368.969780]  [<ffffffffb192620f>] ret_from_fork+0x3f/0x70
[ 2368.969782]  [<ffffffffb10a9b9c>] ? kthread_parkme+0x1f/0x1f
[ 2368.969784] ---[ end trace b054c5c6854fd2ab ]---
[ 2368.969786] ceph: fill_inode badness on ffff880428185d70
[ 2370.289733] ------------[ cut here ]------------
[ 2370.289747] WARNING: CPU: 0 PID: 15 at fs/ceph/inode.c:813
fill_inode.isra.19+0x4b1/0xa49()
[ 2370.289750] Modules linked in:
[ 2370.289756] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G        W
   4.5.0-gentoo #1
[ 2370.289759] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 2370.289767] Workqueue: ceph-msgr ceph_con_workfn
[ 2370.289769]  0000000000000286 000000007000a7b9 ffff88017e267af0
ffffffffb142ec39
[ 2370.289774]  0000000000000000 0000000000000009 ffff88017e267b28
ffffffffb1091c83
[ 2370.289777]  ffffffffb13be512 ffffc900020f58cd ffff880427a30230
ffffffffffffffff
[ 2370.289781] Call Trace:
[ 2370.289787]  [<ffffffffb142ec39>] dump_stack+0x63/0x7f
[ 2370.289793]  [<ffffffffb1091c83>] warn_slowpath_common+0x9a/0xb3
[ 2370.289797]  [<ffffffffb13be512>] ? fill_inode.isra.19+0x4b1/0xa49
[ 2370.289801]  [<ffffffffb1091d86>] warn_slowpath_null+0x15/0x17
[ 2370.289804]  [<ffffffffb13be512>] fill_inode.isra.19+0x4b1/0xa49
[ 2370.289807]  [<ffffffffb13bca00>] ? ceph_mount+0x729/0x72e
[ 2370.289811]  [<ffffffffb13bf705>] ceph_readdir_prepopulate+0x48f/0x70c
[ 2370.289815]  [<ffffffffb13daac3>] dispatch+0xebf/0x1428
[ 2370.289821]  [<ffffffffb19098f2>] ? ceph_x_check_message_signature+0x42/0xc4
[ 2370.289824]  [<ffffffffb18fa16e>] ceph_con_workfn+0xe1a/0x24f3
[ 2370.289829]  [<ffffffffb104603a>] ? load_TLS+0xb/0xf
[ 2370.289832]  [<ffffffffb10468f9>] ? __switch_to+0x3b0/0x42b
[ 2370.289837]  [<ffffffffb10afd8f>] ? finish_task_switch+0xff/0x191
[ 2370.289841]  [<ffffffffb10a53b3>] process_one_work+0x175/0x2a0
[ 2370.289843]  [<ffffffffb10a59c8>] worker_thread+0x1fc/0x2ae
[ 2370.289846]  [<ffffffffb10a57cc>] ? rescuer_thread+0x2c0/0x2c0
[ 2370.289849]  [<ffffffffb10a9c4b>] kthread+0xaf/0xb7
[ 2370.289853]  [<ffffffffb10a9b9c>] ? kthread_parkme+0x1f/0x1f
[ 2370.289857]  [<ffffffffb192620f>] ret_from_fork+0x3f/0x70
[ 2370.289860]  [<ffffffffb10a9b9c>] ? kthread_parkme+0x1f/0x1f
[ 2370.289863] ---[ end trace b054c5c6854fd2ac ]---
[ 2370.289865] ceph: fill_inode badness on ffff880428185d70
[ 2371.525649] ------------[ cut here ]------------
[ 2371.525663] WARNING: CPU: 0 PID: 15 at fs/ceph/inode.c:813
fill_inode.isra.19+0x4b1/0xa49()
[ 2371.525665] Modules linked in:
[ 2371.525670] CPU: 0 PID: 15 Comm: kworker/0:1 Tainted: G        W
   4.5.0-gentoo #1
[ 2371.525672] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 2371.525679] Workqueue: ceph-msgr ceph_con_workfn
[ 2371.525682]  0000000000000286 000000007000a7b9 ffff88017e267af0
ffffffffb142ec39
[ 2371.525685]  0000000000000000 0000000000000009 ffff88017e267b28
ffffffffb1091c83
[ 2371.525687]  ffffffffb13be512 ffffc900021108cd ffff880427a30230
ffffffffffffffff
[ 2371.525690] Call Trace:
[ 2371.525696]  [<ffffffffb142ec39>] dump_stack+0x63/0x7f
[ 2371.525701]  [<ffffffffb1091c83>] warn_slowpath_common+0x9a/0xb3
[ 2371.525704]  [<ffffffffb13be512>] ? fill_inode.isra.19+0x4b1/0xa49
[ 2371.525707]  [<ffffffffb1091d86>] warn_slowpath_null+0x15/0x17
[ 2371.525740]  [<ffffffffb13be512>] fill_inode.isra.19+0x4b1/0xa49
[ 2371.525744]  [<ffffffffb13bca00>] ? ceph_mount+0x729/0x72e
[ 2371.525747]  [<ffffffffb13bf705>] ceph_readdir_prepopulate+0x48f/0x70c
[ 2371.525751]  [<ffffffffb13daac3>] dispatch+0xebf/0x1428
[ 2371.525755]  [<ffffffffb19098f2>] ? ceph_x_check_message_signature+0x42/0xc4
[ 2371.525758]  [<ffffffffb18fa16e>] ceph_con_workfn+0xe1a/0x24f3
[ 2371.525762]  [<ffffffffb104603a>] ? load_TLS+0xb/0xf
[ 2371.525764]  [<ffffffffb10468f9>] ? __switch_to+0x3b0/0x42b
[ 2371.525769]  [<ffffffffb10afd8f>] ? finish_task_switch+0xff/0x191
[ 2371.525772]  [<ffffffffb10a53b3>] process_one_work+0x175/0x2a0
[ 2371.525774]  [<ffffffffb10a59c8>] worker_thread+0x1fc/0x2ae
[ 2371.525776]  [<ffffffffb10a57cc>] ? rescuer_thread+0x2c0/0x2c0
[ 2371.525779]  [<ffffffffb10a9c4b>] kthread+0xaf/0xb7
[ 2371.525782]  [<ffffffffb10a9b9c>] ? kthread_parkme+0x1f/0x1f
[ 2371.525786]  [<ffffffffb192620f>] ret_from_fork+0x3f/0x70
[ 2371.525788]  [<ffffffffb10a9b9c>] ? kthread_parkme+0x1f/0x1f
[ 2371.525790] ---[ end trace b054c5c6854fd2ad ]---

Whenever a readdir is performed on the directory containing the
symlink, and all the stats go ??????? and the symlink is unable to be
deleted/moved/operated on.

I believe it involves the overwrites that vim performs on save (save
to temporary file and move it overtop of existing, I believe). I've
seen it on kernels 4.0->4.5 so far. Possibly even earlier.
Hammer->Infernalis, I've not had a chance to test on Jewel.

I'd dump the symlink data out of the metadata pool, but I'm still
recovering from http://tracker.ceph.com/issues/16177

Not trying to hijack your thread here, though.

--
Adam

On Thu, Jun 16, 2016 at 4:03 PM, Jason Gress <jgress@xxxxxxxxxxxxx> wrote:
> This is the latest default kernel with CentOS7.  We also tried a newer
> kernel (from elrepo), a 4.4 that has the same problem, so I don't think
> that is it.  Thank you for the suggestion though.
>
> We upgraded our cluster to the 10.2.2 release today, and it didn't resolve
> all of the issues.  It's possible that a related issue is actually
> permissions.  Something may not be right with our config (or a bug) here.
>
> While testing we noticed that there may actually be two issues here.  I am
> unsure, as we noticed that the most consistent way to reproduce our issue
> is to use vim or sed -i which does in place renames:
>
> [root@ftp01 cron]# ls -la
> total 3
> drwx------   1 root root 2044 Jun 16 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-r--r--   1 root root  300 Jun 16 15:50 file
> -rw-------   1 root root 2044 Jun 16 13:47 root
> [root@ftp01 cron]# sed -i 's/^/#/' file
> sed: cannot rename ./sedfB2CkO: Permission denied
>
>
> Strangely, adding or deleting files works fine, it's only renaming that
> fails.  And strangely I was able to successfully edit the file on ftp02:
>
> [root@ftp02 cron]# sed -i 's/^/#/' file
> [root@ftp02 cron]# ls -la
> total 3
> drwx------   1 root root 2044 Jun 16 15:49 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-r--r--   1 root root  313 Jun 16 15:49 file
> -rw-------   1 root root 2044 Jun 16 13:47 root
>
>
> Then it worked on ftp01 this time:
> [root@ftp01 cron]# ls -la
> total 3
> drwx------   1 root root 2357 Jun 16 15:49 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-r--r--   1 root root  313 Jun 16 15:49 file
> -rw-------   1 root root 2044 Jun 16 13:47 root
>
>
> Then, I vim'd it successfully on ftp01... Then ran the sed again:
>
> [root@ftp01 cron]# sed -i 's/^/#/' file
> sed: cannot rename ./sedfB2CkO: Permission denied
> [root@ftp01 cron]# ls -la
> total 3
> drwx------   1 root root 2044 Jun 16 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-r--r--   1 root root  300 Jun 16 15:50 file
> -rw-------   1 root root 2044 Jun 16 13:47 root
>
>
> And now we have the zero file problem again:
>
> [root@ftp02 cron]# ls -la
> total 2
> drwx------   1 root root 2044 Jun 16 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-r--r--   1 root root    0 Jun 16 15:50 file
> -rw-------   1 root root 2044 Jun 16 13:47 root
>
>
> Anyway, I wonder how much of this issue is related to that cannot rename
> issue above.  Here are our security settings:
>
> client.ftp01
>         key: <redacted>
>         caps: [mds] allow r, allow rw path=/ftp
>         caps: [mon] allow r
>         caps: [osd] allow rw pool=cephfs_metadata, allow rw pool=cephfs_data
> client.ftp02
>         key: <redacted>
>         caps: [mds] allow r, allow rw path=/ftp
>         caps: [mon] allow r
>         caps: [osd] allow rw pool=cephfs_metadata, allow rw pool=cephfs_data
>
>
> /ftp is the directory on cephfs under which cron lives; the full path is
> /ftp/cron .
>
> I hope this helps and thank you for your time!
>
> Jason
>
> On 6/15/16, 4:43 PM, "John Spray" <jspray@xxxxxxxxxx> wrote:
>
>>On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress <jgress@xxxxxxxxxxxxx>
>>wrote:
>>> While trying to use CephFS as a clustered filesystem, we stumbled upon a
>>> reproducible bug that is unfortunately pretty serious, as it leads to
>>>data
>>> loss.  Here is the situation:
>>>
>>> We have two systems, named ftp01 and ftp02.  They are both running
>>>CentOS
>>> 7.2, with this kernel release and ceph packages:
>>>
>>> kernel-3.10.0-327.18.2.el7.x86_64
>>
>>That is an old-ish kernel to be using with cephfs.  It may well be the
>>source of your issues.
>>
>>> [root@ftp01 cron]# rpm -qa | grep ceph
>>> ceph-base-10.2.1-0.el7.x86_64
>>> ceph-deploy-1.5.33-0.noarch
>>> ceph-mon-10.2.1-0.el7.x86_64
>>> libcephfs1-10.2.1-0.el7.x86_64
>>> ceph-selinux-10.2.1-0.el7.x86_64
>>> ceph-mds-10.2.1-0.el7.x86_64
>>> ceph-common-10.2.1-0.el7.x86_64
>>> ceph-10.2.1-0.el7.x86_64
>>> python-cephfs-10.2.1-0.el7.x86_64
>>> ceph-osd-10.2.1-0.el7.x86_64
>>>
>>> Mounted like so:
>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
>>> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
>>> And:
>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
>>> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0
>>>
>>> This filesystem has 234GB worth of data on it, and I created another
>>> subdirectory and mounted it, NFS style.
>>>
>>> Here were the steps to reproduce:
>>>
>>> First, I created a file (I was mounting /var/spool/cron on two systems)
>>>on
>>> ftp01:
>>> (crond is not running right now on either system to keep the variables
>>>down)
>>>
>>> [root@ftp01 cron]# cp /tmp/root .
>>>
>>> Shows up on both fine:
>>> [root@ftp01 cron]# ls -la
>>> total 2
>>> drwx------   1 root root    0 Jun 15 15:50 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-------   1 root root 2043 Jun 15 15:50 root
>>> [root@ftp01 cron]# md5sum root
>>> 0636c8deaeadfea7b9ddaa29652b43ae  root
>>>
>>> [root@ftp02 cron]# ls -la
>>> total 2
>>> drwx------   1 root root 2043 Jun 15 15:50 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-------   1 root root 2043 Jun 15 15:50 root
>>> [root@ftp02 cron]# md5sum root
>>> 0636c8deaeadfea7b9ddaa29652b43ae  root
>>>
>>> Now, I vim the file on one of them:
>>> [root@ftp01 cron]# vim root
>>> [root@ftp01 cron]# ls -la
>>> total 2
>>> drwx------   1 root root    0 Jun 15 15:51 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-------   1 root root 2044 Jun 15 15:50 root
>>> [root@ftp01 cron]# md5sum root
>>> 7a0c346bbd2b61c5fe990bb277c00917  root
>>>
>>> [root@ftp02 cron]# md5sum root
>>> 7a0c346bbd2b61c5fe990bb277c00917  root
>>>
>>> So far so good, right?  Then, a few seconds later:
>>>
>>> [root@ftp02 cron]# ls -la
>>> total 0
>>> drwx------   1 root root   0 Jun 15 15:51 .
>>> drwxr-xr-x. 10 root root 104 May 19 09:34 ..
>>> -rw-------   1 root root   0 Jun 15 15:50 root
>>> [root@ftp02 cron]# cat root
>>> [root@ftp02 cron]# md5sum root
>>> d41d8cd98f00b204e9800998ecf8427e  root
>>>
>>> And on ftp01:
>>>
>>> [root@ftp01 cron]# ls -la
>>> total 2
>>> drwx------   1 root root    0 Jun 15 15:51 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-------   1 root root 2044 Jun 15 15:50 root
>>> [root@ftp01 cron]# md5sum root
>>> 7a0c346bbd2b61c5fe990bb277c00917  root
>>>
>>> I later create a 'root2' on ftp02 and cause a similar issue.  The end
>>> results are two non-matching files:
>>>
>>> [root@ftp01 cron]# ls -la
>>> total 2
>>> drwx------   1 root root    0 Jun 15 15:53 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-------   1 root root 2044 Jun 15 15:50 root
>>> -rw-r--r--   1 root root    0 Jun 15 15:53 root2
>>>
>>> [root@ftp02 cron]# ls -la
>>> total 2
>>> drwx------   1 root root    0 Jun 15 15:53 .
>>> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
>>> -rw-------   1 root root    0 Jun 15 15:50 root
>>> -rw-r--r--   1 root root 1503 Jun 15 15:53 root2
>>>
>>> We were able to reproduce this on two other systems with the same cephfs
>>> filesystem.  I have also seen cases where the file would just blank out
>>>on
>>> both as well.
>>>
>>> We could not reproduce it with our dev/test cluster running the
>>>development
>>> ceph version:
>>>
>>> ceph-10.2.2-1.g502540f.el7.x86_64
>>
>>Strange.  In that cluster, was the same 3.x kernel in use?  There
>>aren't a whole lot of changes on the server side in v10.2.2 that I
>>could imagine affecting this case.
>>
>>The best thing to do right now is to try using ceph-fuse in your
>>production environment, to check that it is not exhibiting the same
>>behaviour as the old kernel client.  Once you confirm that, I would
>>recommend upgrading your kernel to the most recent 4.x that you are
>>comfortable with, and confirm that that also does not exhibit the bad
>>behaviour.
>>
>>John
>>
>>> Is this a known bug with the current production Jewel release?  If so,
>>>will
>>> it be patched in the next release?
>>>
>>> Thank you very much,
>>>
>>> Jason Gress
>>>
>>> "This message and any attachments may contain confidential information.
>>>If
>>> you
>>> have received this  message in error, any use or distribution is
>>>prohibited.
>>> Please notify us by reply e-mail if you have mistakenly received this
>>> message,
>>> and immediately and permanently delete it and any attachments. Thank
>>>you."
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
>
>
>
> "This message and any attachments may contain confidential information. If you
> have received this  message in error, any use or distribution is prohibited.
> Please notify us by reply e-mail if you have mistakenly received this message,
> and immediately and permanently delete it and any attachments. Thank you."
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com