On Thu, 20 May 2010, Wido den Hollander wrote: > Hi, > > On Thu, 2010-05-20 at 17:09 +0000, Sage Weil wrote: > > > > I'd prefer the situation where i'd stripe over all 4 disks, giving me > > > > and extra pro. In this situation i could configure my node to panic > > > > whenever a disk is starting to give errors, so my cluster can take > > > > over immediately. > > > > > > > > Am i right? Is this "the way to go"? > > > > > > I don't know the way to go. But I think that in the 1st case (1 OSD per > > > hard disk) when a hard disk fails, it gets replicated elsewhere. During > > > that time the other 3 OSDs on the same machine are still working fine > > > and serving requests. And then some time later, you've got a brand new > > > disk, you shutdown the machnie, that's 3 more OSDs down. In the 2nd > > > case, as soon as 1 disk starts failing, your OSD (which is 4 disks) gets > > > taken down, that's approximately equivalent to 4 OSDs going down at the > > > same time if we compare to your 1st case. > > > > The other 3 osds don't have to rereplicate if you swap the failed disk > > quickly, or otherwise inform the system that the failure is temporary. By > > default there is a 5 minute timeout. That can be adjusted, or we can add > > other administrative hooks to 'suspend' any declarations of permanent > > failure for this sort of case. > > Ok, so upping this timeout to something like 10 minutes would be > sufficient for swapping and OSD. > > This is done via the mon_osd_down_out_interval paramater i assume (found > in config.cc) Yes. And you can modify this value on a running system (without modifying the .conf and restarting the monitor) with $ ceph mon injectargs \* '--mon_osd_down_out_interval 600' on the latest unstable. > About more then one OSD on one machine, is there a way how you can bind > an OSD to a specific IP? Can't seem to find any configuration for this. > > I assume you will need one IP per OSD on that machine? Not currently via the .conf, only via the --bind 1.2.3.4:123 command line argument. Adding a bug for this. > And my journaling question, any views on that topic? I don't thank that turning the write cache off will affect btrfs too much, but I haven't tested it. It does need to be off if you use a separate partition. The other alternative is to put the journal file in btrfs, but that is slower. sage > > Thanks! > > > > > sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > Met vriendelijke groet, > > Wido den Hollander > Hoofd Systeembeheer / CSO > Telefoon Support Nederland: 0900 9633 (45 cpm) > Telefoon Support België: 0900 70312 (45 cpm) > Telefoon Direct: (+31) (0)20 50 60 104 > Fax: +31 (0)20 50 60 111 > E-mail: support@xxxxxxxxxxxx > Website: http://www.pcextreme.nl > Kennisbank: http://support.pcextreme.nl/ > Netwerkstatus: http://nmc.pcextreme.nl > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >