Re: CephFS Bug found with CentOS 7.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

i have identical setup, except that i run 10.2.2 now.

I can not reproduce that.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 15.06.2016 um 23:21 schrieb Jason Gress:
> While trying to use CephFS as a clustered filesystem, we stumbled upon a
> reproducible bug that is unfortunately pretty serious, as it leads to
> data loss.  Here is the situation:
> 
> We have two systems, named ftp01 and ftp02.  They are both running
> CentOS 7.2, with this kernel release and ceph packages:
> 
> kernel-3.10.0-327.18.2.el7.x86_64
> 
> [root@ftp01 cron]# rpm -qa | grep ceph
> ceph-base-10.2.1-0.el7.x86_64
> ceph-deploy-1.5.33-0.noarch
> ceph-mon-10.2.1-0.el7.x86_64
> libcephfs1-10.2.1-0.el7.x86_64
> ceph-selinux-10.2.1-0.el7.x86_64
> ceph-mds-10.2.1-0.el7.x86_64
> ceph-common-10.2.1-0.el7.x86_64
> ceph-10.2.1-0.el7.x86_64
> python-cephfs-10.2.1-0.el7.x86_64
> ceph-osd-10.2.1-0.el7.x86_64
> 
> Mounted like so:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp01,secretfile=/etc/ceph/ftp01.secret 0 0
> And:
> XX.XX.XX.XX:/ftp/cron /var/spool/cron ceph
> _netdev,relatime,name=ftp02,secretfile=/etc/ceph/ftp02.secret 0 0
> 
> This filesystem has 234GB worth of data on it, and I created another
> subdirectory and mounted it, NFS style.
> 
> Here were the steps to reproduce:
> 
> First, I created a file (I was mounting /var/spool/cron on two systems)
> on ftp01:
> (crond is not running right now on either system to keep the variables down)
> 
> [root@ftp01 cron]# cp /tmp/root .
> 
> Shows up on both fine:
> [root@ftp01 cron]# ls -la
> total 2
> drwx------   1 root root    0 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-------   1 root root 2043 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
> 
> [root@ftp02 cron]# ls -la
> total 2
> drwx------   1 root root 2043 Jun 15 15:50 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-------   1 root root 2043 Jun 15 15:50 root
> [root@ftp02 cron]# md5sum root
> 0636c8deaeadfea7b9ddaa29652b43ae  root
> 
> Now, I vim the file on one of them:
> [root@ftp01 cron]# vim root
> [root@ftp01 cron]# ls -la
> total 2
> drwx------   1 root root    0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-------   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
> 
> [root@ftp02 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
> 
> So far so good, right?  Then, a few seconds later:
> 
> [root@ftp02 cron]# ls -la
> total 0
> drwx------   1 root root   0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root 104 May 19 09:34 ..
> -rw-------   1 root root   0 Jun 15 15:50 root
> [root@ftp02 cron]# cat root
> [root@ftp02 cron]# md5sum root
> d41d8cd98f00b204e9800998ecf8427e  root
> 
> And on ftp01:
> 
> [root@ftp01 cron]# ls -la
> total 2
> drwx------   1 root root    0 Jun 15 15:51 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-------   1 root root 2044 Jun 15 15:50 root
> [root@ftp01 cron]# md5sum root
> 7a0c346bbd2b61c5fe990bb277c00917  root
> 
> I later create a 'root2' on ftp02 and cause a similar issue.  The end
> results are two non-matching files:
> 
> [root@ftp01 cron]# ls -la
> total 2
> drwx------   1 root root    0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-------   1 root root 2044 Jun 15 15:50 root
> -rw-r--r--   1 root root    0 Jun 15 15:53 root2
> 
> [root@ftp02 cron]# ls -la
> total 2
> drwx------   1 root root    0 Jun 15 15:53 .
> drwxr-xr-x. 10 root root  104 May 19 09:34 ..
> -rw-------   1 root root    0 Jun 15 15:50 root
> -rw-r--r--   1 root root 1503 Jun 15 15:53 root2
> 
> We were able to reproduce this on two other systems with the same cephfs
> filesystem.  I have also seen cases where the file would just blank out
> on both as well.
> 
> We could not reproduce it with our dev/test cluster running the
> development ceph version:
> 
> ceph-10.2.2-1.g502540f.el7.x86_64
> 
> Is this a known bug with the current production Jewel release?  If so,
> will it be patched in the next release?
> 
> Thank you very much,
> 
> Jason Gress
> 
> "This message and any attachments may contain confidential information. If you
> have received this  message in error, any use or distribution is prohibited. 
> Please notify us by reply e-mail if you have mistakenly received this message,
> and immediately and permanently delete it and any attachments. Thank you."
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux