Hi, just to verify this: no symlink usage == no problem/bug right ? -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 17.06.2016 um 06:11 schrieb Yan, Zheng: > On Fri, Jun 17, 2016 at 5:03 AM, Jason Gress <jgress@xxxxxxxxxxxxx> wrote: >> This is the latest default kernel with CentOS7. We also tried a newer >> kernel (from elrepo), a 4.4 that has the same problem, so I don't think >> that is it. Thank you for the suggestion though. >> >> We upgraded our cluster to the 10.2.2 release today, and it didn't resolve >> all of the issues. It's possible that a related issue is actually >> permissions. Something may not be right with our config (or a bug) here. >> >> While testing we noticed that there may actually be two issues here. I am >> unsure, as we noticed that the most consistent way to reproduce our issue >> is to use vim or sed -i which does in place renames: >> >> [root@ftp01 cron]# ls -la >> total 3 >> drwx------ 1 root root 2044 Jun 16 15:50 . >> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >> -rw-r--r-- 1 root root 300 Jun 16 15:50 file >> -rw------- 1 root root 2044 Jun 16 13:47 root >> [root@ftp01 cron]# sed -i 's/^/#/' file >> sed: cannot rename ./sedfB2CkO: Permission denied >> >> >> Strangely, adding or deleting files works fine, it's only renaming that >> fails. And strangely I was able to successfully edit the file on ftp02: >> >> [root@ftp02 cron]# sed -i 's/^/#/' file >> [root@ftp02 cron]# ls -la >> total 3 >> drwx------ 1 root root 2044 Jun 16 15:49 . >> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >> -rw-r--r-- 1 root root 313 Jun 16 15:49 file >> -rw------- 1 root root 2044 Jun 16 13:47 root >> >> >> Then it worked on ftp01 this time: >> [root@ftp01 cron]# ls -la >> total 3 >> drwx------ 1 root root 2357 Jun 16 15:49 . >> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >> -rw-r--r-- 1 root root 313 Jun 16 15:49 file >> -rw------- 1 root root 2044 Jun 16 13:47 root >> >> >> Then, I vim'd it successfully on ftp01... Then ran the sed again: >> >> [root@ftp01 cron]# sed -i 's/^/#/' file >> sed: cannot rename ./sedfB2CkO: Permission denied >> [root@ftp01 cron]# ls -la >> total 3 >> drwx------ 1 root root 2044 Jun 16 15:51 . >> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >> -rw-r--r-- 1 root root 300 Jun 16 15:50 file >> -rw------- 1 root root 2044 Jun 16 13:47 root >> >> >> And now we have the zero file problem again: >> >> [root@ftp02 cron]# ls -la >> total 2 >> drwx------ 1 root root 2044 Jun 16 15:51 . >> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >> -rw-r--r-- 1 root root 0 Jun 16 15:50 file >> -rw------- 1 root root 2044 Jun 16 13:47 root >> >> >> Anyway, I wonder how much of this issue is related to that cannot rename >> issue above. Here are our security settings: >> >> client.ftp01 >> key: <redacted> >> caps: [mds] allow r, allow rw path=/ftp >> caps: [mon] allow r >> caps: [osd] allow rw pool=cephfs_metadata, allow rw pool=cephfs_data >> client.ftp02 >> key: <redacted> >> caps: [mds] allow r, allow rw path=/ftp >> caps: [mon] allow r >> caps: [osd] allow rw pool=cephfs_metadata, allow rw pool=cephfs_data >> >> >> /ftp is the directory on cephfs under which cron lives; the full path is >> /ftp/cron . >> >> I hope this helps and thank you for your time! > > I opened ticket http://tracker.ceph.com/issues/16358. The bug is in > path restriction code. For now, the workaround is updating client caps > to not use path restriction. > > Regards > Yan, Zheng > >> >> Jason >> >> On 6/15/16, 4:43 PM, "John Spray" <jspray@xxxxxxxxxx> wrote: >> >>> On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress <jgress@xxxxxxxxxxxxx> >>> wrote: >>>> While trying to use CephFS as a clustered filesystem, we stumbled upon a >>>> reproducible bug that is unfortunately pretty serious, as it leads to >>>> data >>>> loss. Here is the situation: >>>> >>>> We have two systems, named ftp01 and ftp02. They are both running >>>> CentOS >>>> 7.2, with this kernel release and ceph packages: >>>> >>>> kernel-3.10.0-327.18.2.el7.x86_64 >>> >>> That is an old-ish kernel to be using with cephfs. It may well be the >>> source of your issues. >>> >>>> [root@ftp01 cron]# rpm -qa | grep ceph >>>> ceph-base-10.2.1-0.el7.x86_64 >>>> ceph-deploy-1.5.33-0.noarch >>>> ceph-mon-10.2.1-0.el7.x86_64 >>>> libcephfs1-10.2.1-0.el7.x86_64 >>>> ceph-selinux-10.2.1-0.el7.x86_64 >>>> ceph-mds-10.2.1-0.el7.x86_64 >>>> ceph-common-10.2.1-0.el7.x86_64 >>>> ceph-10.2.1-0.el7.x86_64 >>>> python-cephfs-10.2.1-0.el7.x86_64 >>>> ceph-osd-10.2.1-0.el7.x86_64 >>>> >>>> Mounted like so: >>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph >>>> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0 >>>> And: >>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph >>>> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0 >>>> >>>> This filesystem has 234GB worth of data on it, and I created another >>>> subdirectory and mounted it, NFS style. >>>> >>>> Here were the steps to reproduce: >>>> >>>> First, I created a file (I was mounting /var/spool/cron on two systems) >>>> on >>>> ftp01: >>>> (crond is not running right now on either system to keep the variables >>>> down) >>>> >>>> [root@ftp01 cron]# cp /tmp/root . >>>> >>>> Shows up on both fine: >>>> [root@ftp01 cron]# ls -la >>>> total 2 >>>> drwx------ 1 root root 0 Jun 15 15:50 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 2043 Jun 15 15:50 root >>>> [root@ftp01 cron]# md5sum root >>>> 0636c8deaeadfea7b9ddaa29652b43ae root >>>> >>>> [root@ftp02 cron]# ls -la >>>> total 2 >>>> drwx------ 1 root root 2043 Jun 15 15:50 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 2043 Jun 15 15:50 root >>>> [root@ftp02 cron]# md5sum root >>>> 0636c8deaeadfea7b9ddaa29652b43ae root >>>> >>>> Now, I vim the file on one of them: >>>> [root@ftp01 cron]# vim root >>>> [root@ftp01 cron]# ls -la >>>> total 2 >>>> drwx------ 1 root root 0 Jun 15 15:51 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 2044 Jun 15 15:50 root >>>> [root@ftp01 cron]# md5sum root >>>> 7a0c346bbd2b61c5fe990bb277c00917 root >>>> >>>> [root@ftp02 cron]# md5sum root >>>> 7a0c346bbd2b61c5fe990bb277c00917 root >>>> >>>> So far so good, right? Then, a few seconds later: >>>> >>>> [root@ftp02 cron]# ls -la >>>> total 0 >>>> drwx------ 1 root root 0 Jun 15 15:51 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 0 Jun 15 15:50 root >>>> [root@ftp02 cron]# cat root >>>> [root@ftp02 cron]# md5sum root >>>> d41d8cd98f00b204e9800998ecf8427e root >>>> >>>> And on ftp01: >>>> >>>> [root@ftp01 cron]# ls -la >>>> total 2 >>>> drwx------ 1 root root 0 Jun 15 15:51 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 2044 Jun 15 15:50 root >>>> [root@ftp01 cron]# md5sum root >>>> 7a0c346bbd2b61c5fe990bb277c00917 root >>>> >>>> I later create a 'root2' on ftp02 and cause a similar issue. The end >>>> results are two non-matching files: >>>> >>>> [root@ftp01 cron]# ls -la >>>> total 2 >>>> drwx------ 1 root root 0 Jun 15 15:53 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 2044 Jun 15 15:50 root >>>> -rw-r--r-- 1 root root 0 Jun 15 15:53 root2 >>>> >>>> [root@ftp02 cron]# ls -la >>>> total 2 >>>> drwx------ 1 root root 0 Jun 15 15:53 . >>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>> -rw------- 1 root root 0 Jun 15 15:50 root >>>> -rw-r--r-- 1 root root 1503 Jun 15 15:53 root2 >>>> >>>> We were able to reproduce this on two other systems with the same cephfs >>>> filesystem. I have also seen cases where the file would just blank out >>>> on >>>> both as well. >>>> >>>> We could not reproduce it with our dev/test cluster running the >>>> development >>>> ceph version: >>>> >>>> ceph-10.2.2-1.g502540f.el7.x86_64 >>> >>> Strange. In that cluster, was the same 3.x kernel in use? There >>> aren't a whole lot of changes on the server side in v10.2.2 that I >>> could imagine affecting this case. >>> >>> The best thing to do right now is to try using ceph-fuse in your >>> production environment, to check that it is not exhibiting the same >>> behaviour as the old kernel client. Once you confirm that, I would >>> recommend upgrading your kernel to the most recent 4.x that you are >>> comfortable with, and confirm that that also does not exhibit the bad >>> behaviour. >>> >>> John >>> >>>> Is this a known bug with the current production Jewel release? If so, >>>> will >>>> it be patched in the next release? >>>> >>>> Thank you very much, >>>> >>>> Jason Gress >>>> >>>> "This message and any attachments may contain confidential information. >>>> If >>>> you >>>> have received this message in error, any use or distribution is >>>> prohibited. >>>> Please notify us by reply e-mail if you have mistakenly received this >>>> message, >>>> and immediately and permanently delete it and any attachments. Thank >>>> you." >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> >> >> >> >> "This message and any attachments may contain confidential information. If you >> have received this message in error, any use or distribution is prohibited. >> Please notify us by reply e-mail if you have mistakenly received this message, >> and immediately and permanently delete it and any attachments. Thank you." >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com