This is my config, ; ; Sample ceph ceph.conf file. ; ; This file defines cluster membership, the various locations ; that Ceph stores data, and any other runtime options. ; If a 'host' is defined for a daemon, the start/stop script will ; verify that it matches the hostname (or else ignore it). If it is ; not defined, it is assumed that the daemon is intended to start on ; the current host (e.g., in a setup with a startup.conf on each ; node). ; global [global] ; enable secure authentication auth supported = cephx keyring = /etc/ceph/keyring.bin ; allow ourselves to open a lot of files max open files = 131072 pid file = /var/run/ceph/$name.pid debug ms = 1 ; monitors ; You need at least one. You need at least three if you want to ; tolerate any node failures. Always create an odd number. [mon] mon data = /data/mon$id ; logging, for debugging monitor crashes, in order of ; their likelihood of being helpful :) ;debug ms = 1 ;debug mon = 20 ;debug paxos = 20 ;debug auth = 20 [mon0] host = ceph1 mon addr = 10.0.6.10:6789 [mon1] host = ceph2 mon addr = 10.0.6.11:6789 [mon2] host = ceph3 mon addr = 10.0.6.12:6789 ; mds ; You need at least one. Define two to get a standby. [mds] ; where the mds keeps it's secret encryption keys keyring = /etc/ceph/keyring.$name ; mds logging to debug issues. ;debug ms = 1 ;debug mds = 20 [mds0] host = ceph1 [mds1] host = ceph2 [mds2] host = ceph3 ; osd ; You need at least one. Two if you want data to be replicated. ; Define as many as you like. [osd] sudo = true ; This is where the btrfs volume will be mounted. osd data = /data/osd$id ; where the ods keeps it's secret encryption keys keyring = /etc/ceph/keyring.$name ; Ideally, make this a separate disk or partition. A few ; hundred MB should be enough; more if you have fast or many ; disks. You can use a file under the osd data dir if need be ; (e.g. /data/osd$id/journal), but it will be slower than a ; separate disk or partition. ; This is an example of a file-based journal. ;osd journal = /data/osd$id/journal ;osd journal size = 1000 ; journal size, in megabytes ; osd logging to debug osd issues, in order of likelihood of being ; helpful ; debug ms = 1 ; debug osd = 25 ; debug monc = 20 ; debug journal = 20 ; debug filestore = 10 ; osd use stale snap = true [osd0] host = ceph1 ; if 'btrfs devs' is not specified, you're responsible for ; setting up the 'osd data' dir. if it is not btrfs, things ; will behave up until you try to recover from a crash (which ; usually fine for basic testing). btrfs devs = /dev/sdc osd journal = /dev/sda1 [osd1] host = ceph1 btrfs devs = /dev/sdd osd journal = /dev/sda2 [osd2] host = ceph2 btrfs devs = /dev/sdc osd journal = /dev/sda1 [osd3] host = ceph2 btrfs devs = /dev/sdd osd journal = /dev/sda2 [osd4] host = ceph3 btrfs devs = /dev/sdc osd journal = /dev/sda1 [osd5] host = ceph3 btrfs devs = /dev/sdd osd journal = /dev/sda2 The statistics of the disks, this is after the crash of osd2 and osd4. /dev/sdc 143373312 124954676 18418636 88% /data/osd0 /dev/sdd 143373312 137639524 5733788 97% /data/osd1 /dev/sdc 143373312 120350584 23022728 84% /data/osd2 /dev/sdd 143373312 141986188 1387124 100% /data/osd3 /dev/sdc 143373312 112025716 31347596 79% /data/osd4 /dev/sdd 143373312 115163124 28210188 81% /data/osd5 I will send some statistic of the ext3 as well ----- Ursprungligt meddelande ----- FrÃn: "Gregory Farnum" <gregory.farnum@xxxxxxxxxxxxx> Till: "Martin Wilderoth" <martin.wilderoth@xxxxxxxxxx> Kopia: ceph-devel@xxxxxxxxxxxxxxx Skickat: tisdag, 12 apr 2011 14:24:14 Ãmne: Re: osd stops On Tuesday, April 12, 2011 at 11:05 AM, Martin Wilderoth wrote: Thanks for the answer, now I know the reson. Some of my osd had 90% of data, dmesg also shows error with the btrfs on the hosts. I will run the test with another file system ext3 :-) or is any other filesystem better. It's a backuppc filesystem with a lot of hardlinks and data I would like to test to run in ceph. ext3 or really any other FS will handle it better, although Ceph itself is also not super-resilient to such situations. Eventually we will have automatic rebalancing of data but it's not in there right now. Could you maybe send along your config file and the local filesystem statistics on each of your OSDs? CRUSH is psuedo-random and so it's not going to have perfectly even utilization but if the variance is too high we'll want to look into it sooner rather than later. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html