Re: More data corruption issues with RBD (Ceph 0.61.2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Donnerstag, 13. Juni 2013, 01:58:08 schrieb Josh Durgin:
> On 06/11/2013 11:59 AM, Guido Winkelmann wrote:
> 
> > - Write the data with a very large number of concurrent threads (1000+)
> 
> Are you using rbd caching? If so, turning it off may help reproduce
> faster if it's related to the number of individual requests (since the
> cache may merge adjacent or overlapping requests).

There shouldn't be any RBD caching involved. I'm using libvirt to start my 
VMs, and when specifying the rbd volumes in the domain definition, I use the 
cache="none" attribute.

> > - In the middle of writing, take down one OSD. It seems to matter which
> > OSD
> > that is, so far I could only reproduce the bug taking down the third of
> > three OSDs
> 
> You're killing the OSD process, and not rebooting the host?

I shut down the user space ceph processes with /etc/init.d/ceph stop osd.

> Which filesystem are the OSDs using?

BTRFS

> > My setup is Ceph 0.61.2 on three machines, each running one OSD and one
> > MON. The last one is also running an MDS. The ceph.conf file is attached.
> > 
> > I have just updated to 0.61.3 and plan on rerunning the test on that.
> > The platform is Fedora 18 in all cases with kernel 3.9.4-200.fc18.x86_64.
> 
> If it's reproducible it'd be great to get logs from all osds with
> debug osd = 20, debug ms = 1, and debug filestore = 20.

I've put those settings into the config file now, and, even though I have been 
trying repeatedly for the last few days, now I cannot reproduce the bug 
anymore :(
Maybe it was a problem with my test setup, maybe it was caused by some minor 
thing that was fixed in 0.61.3. Worst case, it was one of those bugs that 
disappear as soon as you enable debugging.

For now, I am going to stop trying to reproduce this, working under the 
assumption that it was either caused by something in my test setup (there was 
a bug in there as well, specifically failure to check whether the writes 
succeeded...) or fixed in 0.61.3. I will also disable those debug settings, 
because they make my /var/log partitions fill up extremely fast and before 
logrotate can do anything about that.

I will keep running those tests in the background though, to see if any 
problems decide to pop up again.

	Guido
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux