I don't think it's the symlink that's the problem, but the path permissions being something other than open. That may be why you didn't see this. I am hoping symlinks still work, as I know we will need them for our application. Jason On 6/17/16, 1:50 AM, "ceph-users on behalf of Oliver Dzombic" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of info@xxxxxxxxxxxxxxxxx> wrote: >Hi, > >just to verify this: > >no symlink usage == no problem/bug > >right ? > >-- >Mit freundlichen Gruessen / Best regards > >Oliver Dzombic >IP-Interactive > >mailto:info@xxxxxxxxxxxxxxxxx > >Anschrift: > >IP Interactive UG ( haftungsbeschraenkt ) >Zum Sonnenberg 1-3 >63571 Gelnhausen > >HRB 93402 beim Amtsgericht Hanau >Geschäftsführung: Oliver Dzombic > >Steuer Nr.: 35 236 3622 1 >UST ID: DE274086107 > > >Am 17.06.2016 um 06:11 schrieb Yan, Zheng: >> On Fri, Jun 17, 2016 at 5:03 AM, Jason Gress <jgress@xxxxxxxxxxxxx> >>wrote: >>> This is the latest default kernel with CentOS7. We also tried a newer >>> kernel (from elrepo), a 4.4 that has the same problem, so I don't think >>> that is it. Thank you for the suggestion though. >>> >>> We upgraded our cluster to the 10.2.2 release today, and it didn't >>>resolve >>> all of the issues. It's possible that a related issue is actually >>> permissions. Something may not be right with our config (or a bug) >>>here. >>> >>> While testing we noticed that there may actually be two issues here. >>>I am >>> unsure, as we noticed that the most consistent way to reproduce our >>>issue >>> is to use vim or sed -i which does in place renames: >>> >>> [root@ftp01 cron]# ls -la >>> total 3 >>> drwx------ 1 root root 2044 Jun 16 15:50 . >>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>> -rw-r--r-- 1 root root 300 Jun 16 15:50 file >>> -rw------- 1 root root 2044 Jun 16 13:47 root >>> [root@ftp01 cron]# sed -i 's/^/#/' file >>> sed: cannot rename ./sedfB2CkO: Permission denied >>> >>> >>> Strangely, adding or deleting files works fine, it's only renaming that >>> fails. And strangely I was able to successfully edit the file on >>>ftp02: >>> >>> [root@ftp02 cron]# sed -i 's/^/#/' file >>> [root@ftp02 cron]# ls -la >>> total 3 >>> drwx------ 1 root root 2044 Jun 16 15:49 . >>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>> -rw-r--r-- 1 root root 313 Jun 16 15:49 file >>> -rw------- 1 root root 2044 Jun 16 13:47 root >>> >>> >>> Then it worked on ftp01 this time: >>> [root@ftp01 cron]# ls -la >>> total 3 >>> drwx------ 1 root root 2357 Jun 16 15:49 . >>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>> -rw-r--r-- 1 root root 313 Jun 16 15:49 file >>> -rw------- 1 root root 2044 Jun 16 13:47 root >>> >>> >>> Then, I vim'd it successfully on ftp01... Then ran the sed again: >>> >>> [root@ftp01 cron]# sed -i 's/^/#/' file >>> sed: cannot rename ./sedfB2CkO: Permission denied >>> [root@ftp01 cron]# ls -la >>> total 3 >>> drwx------ 1 root root 2044 Jun 16 15:51 . >>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>> -rw-r--r-- 1 root root 300 Jun 16 15:50 file >>> -rw------- 1 root root 2044 Jun 16 13:47 root >>> >>> >>> And now we have the zero file problem again: >>> >>> [root@ftp02 cron]# ls -la >>> total 2 >>> drwx------ 1 root root 2044 Jun 16 15:51 . >>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>> -rw-r--r-- 1 root root 0 Jun 16 15:50 file >>> -rw------- 1 root root 2044 Jun 16 13:47 root >>> >>> >>> Anyway, I wonder how much of this issue is related to that cannot >>>rename >>> issue above. Here are our security settings: >>> >>> client.ftp01 >>> key: <redacted> >>> caps: [mds] allow r, allow rw path=/ftp >>> caps: [mon] allow r >>> caps: [osd] allow rw pool=cephfs_metadata, allow rw >>>pool=cephfs_data >>> client.ftp02 >>> key: <redacted> >>> caps: [mds] allow r, allow rw path=/ftp >>> caps: [mon] allow r >>> caps: [osd] allow rw pool=cephfs_metadata, allow rw >>>pool=cephfs_data >>> >>> >>> /ftp is the directory on cephfs under which cron lives; the full path >>>is >>> /ftp/cron . >>> >>> I hope this helps and thank you for your time! >> >> I opened ticket http://tracker.ceph.com/issues/16358. The bug is in >> path restriction code. For now, the workaround is updating client caps >> to not use path restriction. >> >> Regards >> Yan, Zheng >> >>> >>> Jason >>> >>> On 6/15/16, 4:43 PM, "John Spray" <jspray@xxxxxxxxxx> wrote: >>> >>>> On Wed, Jun 15, 2016 at 10:21 PM, Jason Gress <jgress@xxxxxxxxxxxxx> >>>> wrote: >>>>> While trying to use CephFS as a clustered filesystem, we stumbled >>>>>upon a >>>>> reproducible bug that is unfortunately pretty serious, as it leads to >>>>> data >>>>> loss. Here is the situation: >>>>> >>>>> We have two systems, named ftp01 and ftp02. They are both running >>>>> CentOS >>>>> 7.2, with this kernel release and ceph packages: >>>>> >>>>> kernel-3.10.0-327.18.2.el7.x86_64 >>>> >>>> That is an old-ish kernel to be using with cephfs. It may well be the >>>> source of your issues. >>>> >>>>> [root@ftp01 cron]# rpm -qa | grep ceph >>>>> ceph-base-10.2.1-0.el7.x86_64 >>>>> ceph-deploy-1.5.33-0.noarch >>>>> ceph-mon-10.2.1-0.el7.x86_64 >>>>> libcephfs1-10.2.1-0.el7.x86_64 >>>>> ceph-selinux-10.2.1-0.el7.x86_64 >>>>> ceph-mds-10.2.1-0.el7.x86_64 >>>>> ceph-common-10.2.1-0.el7.x86_64 >>>>> ceph-10.2.1-0.el7.x86_64 >>>>> python-cephfs-10.2.1-0.el7.x86_64 >>>>> ceph-osd-10.2.1-0.el7.x86_64 >>>>> >>>>> Mounted like so: >>>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph >>>>> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0 >>>>> And: >>>>> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph >>>>> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0 >>>>> >>>>> This filesystem has 234GB worth of data on it, and I created another >>>>> subdirectory and mounted it, NFS style. >>>>> >>>>> Here were the steps to reproduce: >>>>> >>>>> First, I created a file (I was mounting /var/spool/cron on two >>>>>systems) >>>>> on >>>>> ftp01: >>>>> (crond is not running right now on either system to keep the >>>>>variables >>>>> down) >>>>> >>>>> [root@ftp01 cron]# cp /tmp/root . >>>>> >>>>> Shows up on both fine: >>>>> [root@ftp01 cron]# ls -la >>>>> total 2 >>>>> drwx------ 1 root root 0 Jun 15 15:50 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 2043 Jun 15 15:50 root >>>>> [root@ftp01 cron]# md5sum root >>>>> 0636c8deaeadfea7b9ddaa29652b43ae root >>>>> >>>>> [root@ftp02 cron]# ls -la >>>>> total 2 >>>>> drwx------ 1 root root 2043 Jun 15 15:50 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 2043 Jun 15 15:50 root >>>>> [root@ftp02 cron]# md5sum root >>>>> 0636c8deaeadfea7b9ddaa29652b43ae root >>>>> >>>>> Now, I vim the file on one of them: >>>>> [root@ftp01 cron]# vim root >>>>> [root@ftp01 cron]# ls -la >>>>> total 2 >>>>> drwx------ 1 root root 0 Jun 15 15:51 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 2044 Jun 15 15:50 root >>>>> [root@ftp01 cron]# md5sum root >>>>> 7a0c346bbd2b61c5fe990bb277c00917 root >>>>> >>>>> [root@ftp02 cron]# md5sum root >>>>> 7a0c346bbd2b61c5fe990bb277c00917 root >>>>> >>>>> So far so good, right? Then, a few seconds later: >>>>> >>>>> [root@ftp02 cron]# ls -la >>>>> total 0 >>>>> drwx------ 1 root root 0 Jun 15 15:51 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 0 Jun 15 15:50 root >>>>> [root@ftp02 cron]# cat root >>>>> [root@ftp02 cron]# md5sum root >>>>> d41d8cd98f00b204e9800998ecf8427e root >>>>> >>>>> And on ftp01: >>>>> >>>>> [root@ftp01 cron]# ls -la >>>>> total 2 >>>>> drwx------ 1 root root 0 Jun 15 15:51 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 2044 Jun 15 15:50 root >>>>> [root@ftp01 cron]# md5sum root >>>>> 7a0c346bbd2b61c5fe990bb277c00917 root >>>>> >>>>> I later create a 'root2' on ftp02 and cause a similar issue. The end >>>>> results are two non-matching files: >>>>> >>>>> [root@ftp01 cron]# ls -la >>>>> total 2 >>>>> drwx------ 1 root root 0 Jun 15 15:53 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 2044 Jun 15 15:50 root >>>>> -rw-r--r-- 1 root root 0 Jun 15 15:53 root2 >>>>> >>>>> [root@ftp02 cron]# ls -la >>>>> total 2 >>>>> drwx------ 1 root root 0 Jun 15 15:53 . >>>>> drwxr-xr-x. 10 root root 104 May 19 09:34 .. >>>>> -rw------- 1 root root 0 Jun 15 15:50 root >>>>> -rw-r--r-- 1 root root 1503 Jun 15 15:53 root2 >>>>> >>>>> We were able to reproduce this on two other systems with the same >>>>>cephfs >>>>> filesystem. I have also seen cases where the file would just blank >>>>>out >>>>> on >>>>> both as well. >>>>> >>>>> We could not reproduce it with our dev/test cluster running the >>>>> development >>>>> ceph version: >>>>> >>>>> ceph-10.2.2-1.g502540f.el7.x86_64 >>>> >>>> Strange. In that cluster, was the same 3.x kernel in use? There >>>> aren't a whole lot of changes on the server side in v10.2.2 that I >>>> could imagine affecting this case. >>>> >>>> The best thing to do right now is to try using ceph-fuse in your >>>> production environment, to check that it is not exhibiting the same >>>> behaviour as the old kernel client. Once you confirm that, I would >>>> recommend upgrading your kernel to the most recent 4.x that you are >>>> comfortable with, and confirm that that also does not exhibit the bad >>>> behaviour. >>>> >>>> John >>>> >>>>> Is this a known bug with the current production Jewel release? If >>>>>so, >>>>> will >>>>> it be patched in the next release? >>>>> >>>>> Thank you very much, >>>>> >>>>> Jason Gress >>>>> >>>>> "This message and any attachments may contain confidential >>>>>information. >>>>> If >>>>> you >>>>> have received this message in error, any use or distribution is >>>>> prohibited. >>>>> Please notify us by reply e-mail if you have mistakenly received this >>>>> message, >>>>> and immediately and permanently delete it and any attachments. Thank >>>>> you." >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>> >>> >>> >>> >>> "This message and any attachments may contain confidential >>>information. If you >>> have received this message in error, any use or distribution is >>>prohibited. >>> Please notify us by reply e-mail if you have mistakenly received this >>>message, >>> and immediately and permanently delete it and any attachments. Thank >>>you." >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com "This message and any attachments may contain confidential information. If you have received this message in error, any use or distribution is prohibited. Please notify us by reply e-mail if you have mistakenly received this message, and immediately and permanently delete it and any attachments. Thank you." _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com