Re: domino-style OSD crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le 06/07/2012 19:01, Gregory Farnum a écrit :
On Fri, Jul 6, 2012 at 12:19 AM, Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> wrote:
Le 05/07/2012 23:32, Gregory Farnum a écrit :

[...]

ok, so as all nodes were identical, I probably have hit a btrfs bug (like
a
erroneous out of space ) in more or less the same time. And when 1 osd
was
out,

OH , I didn't finish the sentence... When 1 osd was out, missing data was
copied on another nodes, probably speeding btrfs problem on those nodes (I
suspect erroneous out of space conditions)
Ah. How full are/were the disks?

The OSD nodes were below 50 % (all are 5 To volumes):

osd.0 : 31%
osd.1 : 31%
osd.2 : 39%
osd.3 : 65%
no osd.4 :)
osd.5 : 35%
osd.6 : 60%
osd.7 : 42%
osd.8 : 34%

all the volumes were using btrfs with lzo compress.

[...]

Oh, interesting. Are the broken nodes all on the same set of arrays?

No. There are 4 completely independant raid arrays, in 4 different
locations. They are similar (same brand & model, but slighltly different
disks, and 1 different firmware), all arrays are multipathed. I don't think
the raid array is the problem. We use those particular models since 2/3
years, and in the logs I don't see any problem that can be caused by the
storage itself (like scsi or multipath errors)
I must have misunderstood then. What did you mean by "1 Array for 2 OSD nodes"?

I have 8 osd nodes, in 4 different locations (several km away). In each location I have 2 nodes and 1 raid Array. On each location, each raid array has 16 2To disks, 2 controllers with 4x 8 Gb FC channels each. The 16 disks are organized in Raid 5 (8 disks for one, 7 disks for the orher). Each raid set is primary attached to 1 controller, and each osd node on the location has acces to the controller with 2 distinct paths.

There were no correlation between failed nodes & raid array.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux