Right - I see from the 0.80.8 notes that we merged a fix for #9073. However (unfortunately) there were a number of patches that we experimented with on this issue - and this looks like one of the earlier ones (i.e not what we merged into master at the time), which is a bit confusing (maybe it was to avoid a more invasive patch...). Maybe Somnath or Jianpeng know why? Cheers Mark On 08/06/15 20:08, Christian Balzer wrote: > > Mark, > > one would hope you can't with 0.80.9 as per the release notes, while > 0.80.7 definitely was susceptible. > > Christian > > On Mon, 08 Jun 2015 20:05:20 +1200 Mark Kirkwood wrote: > >> Trying out some tests on my pet VMs with 0.80.9 does not elicit any >> journal failures...However ISTR that running on the bare metal was the >> most reliable way to reproduce...(proceeding - currently cannot get >> ceph-deploy to install this configuration...I'll investigate further >> tomorrow)! >> >> Cheers >> >> Mark >> >> On 06/06/15 18:04, Mark Kirkwood wrote: >>> Righty - I'll see if I can replicate what you see if I setup an 0.80.9 >>> cluster using the same workstation hardware (WD Raptors and Intel 520s) >>> that showed up the issue previously at 0.83 (I wonder if I never tried >>> a fresh install using the 0.80.* tree)... >>> >>> May be a few days... >>> >>> On 05/06/15 16:49, Christian Balzer wrote: >>>> >>>> Hello, >>>> >>>> On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote: >>>> >>>> Well, whatever it is, I appear to not be the only one after all: >>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361 >>>> >>>>> Looking quickly at the relevant code: >>>>> >>>>> FileJournal::stop_writer() in src/os/FileJpurnal.cc >>>>> >>>>> I see that we didn't start seeing the (original) issue until changes >>>>> in 0.83, which suggests that 0.80 tree might not be doing the same >>>>> thing. *However* I note that I'm not happy with the placement of the >>>>> two thread join operations in there - it *looks* to me like 0.80 >>>>> could in fact be vulnerable to the same journal corrupting problem, >>>>> so if it occurs again might be interesting to apply the gist of >>>>> https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of >>>>> course would be best if this was on a test system)! >>>>> >>>> Alas this is neither a test cluster, nor do I have things set up to >>>> compile >>>> from source here ATM. >>>> >>>> Christian >>>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com