Am Donnerstag, 13. Juni 2013, 01:58:08 schrieb Josh Durgin: > On 06/11/2013 11:59 AM, Guido Winkelmann wrote: > > > - Write the data with a very large number of concurrent threads (1000+) > > Are you using rbd caching? If so, turning it off may help reproduce > faster if it's related to the number of individual requests (since the > cache may merge adjacent or overlapping requests). There shouldn't be any RBD caching involved. I'm using libvirt to start my VMs, and when specifying the rbd volumes in the domain definition, I use the cache="none" attribute. > > - In the middle of writing, take down one OSD. It seems to matter which > > OSD > > that is, so far I could only reproduce the bug taking down the third of > > three OSDs > > You're killing the OSD process, and not rebooting the host? I shut down the user space ceph processes with /etc/init.d/ceph stop osd. > Which filesystem are the OSDs using? BTRFS > > My setup is Ceph 0.61.2 on three machines, each running one OSD and one > > MON. The last one is also running an MDS. The ceph.conf file is attached. > > > > I have just updated to 0.61.3 and plan on rerunning the test on that. > > The platform is Fedora 18 in all cases with kernel 3.9.4-200.fc18.x86_64. > > If it's reproducible it'd be great to get logs from all osds with > debug osd = 20, debug ms = 1, and debug filestore = 20. I've put those settings into the config file now, and, even though I have been trying repeatedly for the last few days, now I cannot reproduce the bug anymore :( Maybe it was a problem with my test setup, maybe it was caused by some minor thing that was fixed in 0.61.3. Worst case, it was one of those bugs that disappear as soon as you enable debugging. For now, I am going to stop trying to reproduce this, working under the assumption that it was either caused by something in my test setup (there was a bug in there as well, specifically failure to check whether the writes succeeded...) or fixed in 0.61.3. I will also disable those debug settings, because they make my /var/log partitions fill up extremely fast and before logrotate can do anything about that. I will keep running those tests in the background though, to see if any problems decide to pop up again. Guido _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com