Re: OSD trashed by simple reboot (Debian Jessie, systemd?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Right - I see from the 0.80.8 notes that we merged a fix for #9073.
However (unfortunately) there were a number of patches that we
experimented with on this issue - and this looks like one of the earlier
ones (i.e not what we merged into master at the time), which is a bit
confusing (maybe it was to avoid a more invasive patch...). Maybe
Somnath or Jianpeng know why?

Cheers

Mark


On 08/06/15 20:08, Christian Balzer wrote:
> 
> Mark,
> 
> one would hope you can't with 0.80.9 as per the release notes, while
> 0.80.7 definitely was susceptible. 
> 
> Christian
> 
> On Mon, 08 Jun 2015 20:05:20 +1200 Mark Kirkwood wrote:
> 
>> Trying out some tests on my pet VMs with 0.80.9 does not elicit any 
>> journal failures...However ISTR that running on the bare metal was the 
>> most reliable way to reproduce...(proceeding - currently cannot get 
>> ceph-deploy to install this configuration...I'll investigate further 
>> tomorrow)!
>>
>> Cheers
>>
>> Mark
>>
>> On 06/06/15 18:04, Mark Kirkwood wrote:
>>> Righty - I'll see if I can replicate what you see if I setup an 0.80.9
>>> cluster using the same workstation hardware (WD Raptors and Intel 520s)
>>> that showed up the issue previously at 0.83 (I wonder if I never tried
>>> a fresh install using the 0.80.* tree)...
>>>
>>> May be a few days...
>>>
>>> On 05/06/15 16:49, Christian Balzer wrote:
>>>>
>>>> Hello,
>>>>
>>>> On Fri, 05 Jun 2015 16:33:46 +1200 Mark Kirkwood wrote:
>>>>
>>>> Well, whatever it is, I appear to not be the only one after all:
>>>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=773361
>>>>
>>>>> Looking quickly at the relevant code:
>>>>>
>>>>> FileJournal::stop_writer() in src/os/FileJpurnal.cc
>>>>>
>>>>> I see that we didn't start seeing the (original) issue until changes
>>>>> in 0.83, which suggests that 0.80 tree might not be doing the same
>>>>> thing. *However* I note that I'm not happy with the placement of the
>>>>> two thread join operations in there - it *looks* to me like 0.80
>>>>> could in fact be vulnerable to the same journal corrupting problem,
>>>>> so if it occurs again might be interesting to apply the gist of
>>>>> https://github.com/ceph/ceph/pull/2764 and see if it helps (ahem - of
>>>>> course would be best if this was on a test system)!
>>>>>
>>>> Alas this is neither a test cluster, nor do I have things set up to
>>>> compile
>>>> from source here ATM.
>>>>
>>>> Christian
>>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux