Re: Questions about a possible Ceph setup

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 20 May 2010 11:24:50 -0700 (PDT)

On Thu, 20 May 2010, Wido den Hollander wrote:

> Hi,
> 
> On Thu, 2010-05-20 at 17:09 +0000, Sage Weil wrote:
> > > > I'd prefer the situation where i'd stripe over all 4 disks, giving me 
> > > > and extra pro. In this situation i could configure my node to panic 
> > > > whenever a disk is starting to give errors, so my cluster can take 
> > > > over immediately.
> > > > 
> > > > Am i right? Is this "the way to go"?
> > > 
> > > I don't know the way to go. But I think that in the 1st case (1 OSD per 
> > > hard disk) when a hard disk fails, it gets replicated elsewhere. During 
> > > that time the other 3 OSDs on the same machine are still working fine 
> > > and serving requests. And then some time later, you've got a brand new 
> > > disk, you shutdown the machnie, that's 3 more OSDs down. In the 2nd 
> > > case, as soon as 1 disk starts failing, your OSD (which is 4 disks) gets 
> > > taken down, that's approximately equivalent to 4 OSDs going down at the 
> > > same time if we compare to your 1st case.
> > 
> > The other 3 osds don't have to rereplicate if you swap the failed disk 
> > quickly, or otherwise inform the system that the failure is temporary.  By 
> > default there is a 5 minute timeout.  That can be adjusted, or we can add 
> > other administrative hooks to 'suspend' any declarations of permanent 
> > failure for this sort of case.
> 
> Ok, so upping this timeout to something like 10 minutes would be
> sufficient for swapping and OSD.
> 
> This is done via the mon_osd_down_out_interval paramater i assume (found
> in config.cc)

Yes.  And you can modify this value on a running system (without modifying 
the .conf and restarting the monitor) with

$ ceph mon injectargs \* '--mon_osd_down_out_interval 600'

on the latest unstable.

> About more then one OSD on one machine, is there a way how you can bind
> an OSD to a specific IP? Can't seem to find any configuration for this.
> 
> I assume you will need one IP per OSD on that machine?

Not currently via the .conf, only via the --bind 1.2.3.4:123 command line 
argument.  Adding a bug for this.

> And my journaling question, any views on that topic?

I don't thank that turning the write cache off will affect btrfs too much, 
but I haven't tested it.  It does need to be off if you use a separate 
partition.  The other alternative is to put the journal file in btrfs, but 
that is slower.

sage

> 
> Thanks!
> 
> > 
> > sage
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Wido den Hollander
> Hoofd Systeembeheer / CSO
> Telefoon Support Nederland: 0900 9633 (45 cpm)
> Telefoon Support België: 0900 70312 (45 cpm)
> Telefoon Direct: (+31) (0)20 50 60 104
> Fax: +31 (0)20 50 60 111
> E-mail: support@xxxxxxxxxxxx
> Website: http://www.pcextreme.nl
> Kennisbank: http://support.pcextreme.nl/
> Netwerkstatus: http://nmc.pcextreme.nl
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>