Re: domino-style OSD crash

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 6 Jul 2012 10:01:28 -0700

On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> wrote:
> Le 05/07/2012 23:32, Gregory Farnum a écrit :
>
> [...]
>
>>> ok, so as all nodes were identical, I probably have hit a btrfs bug (like
>>> a
>>> erroneous out of space ) in more or less the same time. And when 1 osd
>>> was
>>> out,
>
>
> OH , I didn't finish the sentence... When 1 osd was out, missing data was
> copied on another nodes, probably speeding btrfs problem on those nodes (I
> suspect erroneous out of space conditions)

Ah. How full are/were the disks?

>
> I've reformatted OSD with xfs. Performance is slightly worse for the moment
> (well, depend on the workload, and maybe lack of syncfs is to blame), but at
> least I hope to have the storage layer rock-solid. BTW, I've managed to keep
> the faulty btrfs volumes .
>
> [...]
>
>
>>>> I wonder if maybe there's a confounding factor here — are all your nodes
>>>> similar to each other,
>>>
>>> Yes. I designed the cluster that way. All nodes are identical hardware
>>> (powerEdge M610, 10G intel ethernet + emulex fibre channel attached to
>>> storage (1 Array for 2 OSD nodes, 1 controller dedicated for each OSD)
>>
>> Oh, interesting. Are the broken nodes all on the same set of arrays?
>
>
> No. There are 4 completely independant raid arrays, in 4 different
> locations. They are similar (same brand & model, but slighltly different
> disks, and 1 different firmware), all arrays are multipathed. I don't think
> the raid array is the problem. We use those particular models since 2/3
> years, and in the logs I don't see any problem that can be caused by the
> storage itself (like scsi or multipath errors)

I must have misunderstood then. What did you mean by "1 Array for 2 OSD nodes"?
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html