å 2010-12-01äç 09:35 +0800ïJeff Wuåéï > > å 2010-12-01äç 01:07 +0800ïGregory Farnumåéï > > On Mon, Nov 29, 2010 at 10:19 PM, Jeff Wu <cpwu@xxxxxxxxxxxxx> wrote: > > > Is "40-50MB/s" the speed that it run bench at local btrfs disk ? > > > not the speed that run bench from client to osd server ? > > > with this speed ,run bench from client to osd server ,will which get > > > about 20~25MB/s( 40~50MB /2 )speed ? > > Data on Ceph is replicated across 2 OSDs (by default; this is > > configurable). So while figuring out potential performance involves a > > lot of variables, in a simple case like this where you aren't bounded > > by network bandwidth you'll find that your read/write performance > > simply tracks the slower disk. I'd expect your Ceph tests (at least > > the streaming ones) to run at 40-50MB/s. > > Hi Greg,thank you very much for your quickly reply. > > > > Given that everything else is okay, I cannot stress enough that > > running without a journal is going to cause significant performance > > degradations. I have a hard time believing that it's responsible for > > 13-second latencies, but it's possible. So how about you set up a > > journal (it can just be a file or new partition on the drives you're > > already using) and report back your results after you do that. :) > > I will add journal to ceph.conf to try it . > > Hi ,greg, With your suggestions, i add the journal config: " osd data = /opt/ceph/data/osd$id osd journal = /home/transoft/data/osd$id/journal filestore journal writeahead = true osd journal size = 10000 " to ceph.conf. the detail ceph.conf attached below. then , run six times for the commad: "$ sudo ceph osd tell 0/1 bench" ,get the results: $ sudo ceph -w osd0 172.16.10.42:6800/17347 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 29.818194 sec at 28201 KB/sec osd0 172.16.10.42:6800/17347 2 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 30.013058 sec at 34801 KB/sec osd0 172.16.10.42:6800/17347 3 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 30.463511 sec at 30274 KB/sec osd1 172.16.10.65:6800/4845 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 165.067603 sec at 6329 KB/sec osd1 172.16.10.65:6800/4845 2 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 181.034333 sec at 5782 KB/sec osd1 172.16.10.65:6800/4845 3 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 196.055812 sec at 5334 KB/sec and i also use "dd" to test raw drive, get the logs: 1. OSD0, mkfs.btrfs format /opt $ sudo dd if=/dev/zero of=/opt/dd.img bs=2M count=1024 1024+0 records in 1024+0 records out 2147483648 bytes transfered in 21.4497 secs(100 MB/sec) 2. OSD1 ,mkfs. btrfs format /opt ~$ sudo dd if=/dev/zero of=/opt/dd.img bs=2M count=1024 1024+0 records in 1024+0 records out 2147483648 bytes transfered in 48.2037 secs(44.6 MB/sec) with these logs, OSD1 disk speed might limit the test performance. and i also detect a issue ,take the following steps: $. mckephfs -c ceph.conf -v --mkbtrfs -a $ init-ceph - ceph.conf --btrfs -v -a start then execute: $ init-ceph - ceph.conf --btrfs -v -a stop this command can't stop OSD0 and OSD1 cosd process: OSD0: /usr/local/bin/cosd -i 0 -c ceph.conf OSD1: /usr/local/bin/cosd -i 1 -c ceph.conf then , i create the folder "/var/run/ceph" at OSD0 and OSD1 host manually. execute: $ init-ceph - ceph.conf --btrfs -v -a stop this command can stop OSD0 and OSD1 cosd process: /usr/local/bin/cosd -i 0 -c ceph.conf /usr/local/bin/cosd -i 1 -c ceph.conf Thanks, Jeff.Wu > > > Adding a journal to the OSDs lets them turn all their random writes > > into streaming ones. > > -Greg > ========================================================= transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 0 bench 2010-12-01 10:45:13.670910 mon <- [osd,tell,0,bench] 2010-12-01 10:45:13.671180 mon1 -> 'ok' (0) transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 0 bench 2010-12-01 10:45:29.350198 mon <- [osd,tell,0,bench] 2010-12-01 10:45:29.350457 mon1 -> 'ok' (0) transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 0 bench 2010-12-01 10:45:31.000281 mon <- [osd,tell,0,bench] 2010-12-01 10:45:31.000560 mon0 -> 'ok' (0) transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 1 bench 2010-12-01 10:45:34.860782 mon <- [osd,tell,1,bench] 2010-12-01 10:45:34.861020 mon1 -> 'ok' (0) transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 1 bench 2010-12-01 10:45:36.760811 mon <- [osd,tell,1,bench] 2010-12-01 10:45:36.761161 mon2 -> 'ok' (0) transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph osd tell 1 bench 2010-12-01 10:45:37.530714 mon <- [osd,tell,1,bench] 2010-12-01 10:45:37.530968 mon2 -> 'ok' (0) transoft@ubuntu-mon0:/usr/local/etc/ceph$ sudo ceph -w 2010-12-01 10:44:59.450653 pg v13: 528 pgs: 528 active+clean; 12 KB data, 5304 KB used, 219 GB / 219 GB avail 2010-12-01 10:44:59.451365 mds e5: 1/1/1 up {0=up:active}, 1 up:standby 2010-12-01 10:44:59.451387 osd e6: 2 osds: 2 up, 2 in 2010-12-01 10:44:59.451412 log 2010-12-01 10:43:43.044865 mon0 172.16.10.171:6789/0 7 : [INF] mds0 172.16.10.171:6801/2482 up:active 2010-12-01 10:44:59.451440 mon e1: 3 mons at {0=172.16.10.171:6789/0,1=172.16.10.171:6790/0,2=172.16.10.171:6791/0} 2010-12-01 10:46:45.000262 log 2010-12-01 10:45:15.599526 osd0 172.16.10.42:6800/17347 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 29.818194 sec at 28201 KB/sec 2010-12-01 10:46:45.000262 log 2010-12-01 10:45:46.062142 osd0 172.16.10.42:6800/17347 2 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 30.013058 sec at 34801 KB/sec 2010-12-01 10:46:45.000262 log 2010-12-01 10:46:16.836607 osd0 172.16.10.42:6800/17347 3 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 30.463511 sec at 30274 KB/sec 2010-12-01 10:48:20.042152 pg v14: 528 pgs: 528 active+clean; 32780 KB data, 888 MB used, 218 GB / 219 GB avail 2010-12-01 10:50:50.038298 pg v15: 528 pgs: 528 active+clean; 73740 KB data, 54928 KB used, 219 GB / 219 GB avail 2010-12-01 10:52:15.074470 pg v16: 528 pgs: 528 active+clean; 73740 KB data, 79440 KB used, 219 GB / 219 GB avail 2010-12-01 10:54:55.546098 log 2010-12-01 11:52:34.244851 osd1 172.16.10.65:6800/4845 1 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 165.067603 sec at 6329 KB/sec 2010-12-01 10:54:55.546098 log 2010-12-01 11:55:52.010739 osd1 172.16.10.65:6800/4845 2 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 181.034333 sec at 5782 KB/sec 2010-12-01 10:54:55.546098 log 2010-12-01 11:59:09.560115 osd1 172.16.10.65:6800/4845 3 : [INF] bench: wrote 1024 MB in blocks of 4096 KB in 196.055812 sec at 5334 KB/sec 2010-12-01 10:55:01.001357 pg v17: 528 pgs: 528 active+clean; 73741 KB data, 1106 MB used, 218 GB / 219 GB avail ============ceph.conf==================== ; ; Sample ceph ceph.conf file. ; ; This file defines cluster membership, the various locations ; that Ceph stores data, and any other runtime options. ; If a 'host' is defined for a daemon, the start/stop script will ; verify that it matches the hostname (or else ignore it). If it is ; not defined, it is assumed that the daemon is intended to start on ; the current host (e.g., in a setup with a startup.conf on each ; node). ; global [global] ; enable secure authentication ; auth supported = cephx keyring = /etc/ceph/keyring.bin ; monitors ; You need at least one. You need at least three if you want to ; tolerate any node failures. Always create an odd number. [mon] mon data = /opt/ceph/data/mon$id ;mon data = /home/transoft/data/mon$id ; logging, for debugging monitor crashes, in order of ; their likelihood of being helpful :) ;debug ms = 20 ;debug mon = 20 ;debug paxos = 20 ;debug auth = 20 [mon0] host = ubuntu-mon0 mon addr = 172.16.10.171:6789 [mon1] host = ubuntu-mon0 mon addr = 172.16.10.171:6790 [mon2] host = ubuntu-mon0 mon addr = 172.16.10.171:6791 ; mds ; You need at least one. Define two to get a standby. [mds] ; where the mds keeps it's secret encryption keys keyring = /etc/ceph/keyring.$name ; mds logging to debug issues. ;debug ms = 20 ;debug mds = 20 [mds.0] host = ubuntu-mon0 [mds.1] host = ubuntu-mon0 ; osd ; You need at least one. Two if you want data to be replicated. ; Define as many as you like. [osd] ; This is where the btrfs volume will be mounted. ;osd data = /opt/ceph/data/osd$id osd class tmp = /var/lib/ceph/tmp ; Ideally, make this a separate disk or partition. A few ; hundred MB should be enough; more if you have fast or many ; disks. You can use a file under the osd data dir if need be ; (e.g. /data/osd$id/journal), but it will be slower than a ; separate disk or partition. ; This is an example of a file-based journal. ;osd journal = /home/transoft/data/osd$id/journal ;filestore journal writeahead = true ; journal size, in megabytes ;osd journal size = 1000 keyring = /etc/ceph/keyring.$name ; osd logging to debug osd issues, in order of likelihood of being ; helpful ;debug ms = 20 ;debug osd = 20 ;debug filestore = 20 ;debug journal = 20 [osd0] host = ubuntu-osd0 osd data = /opt/ceph/data/osd$id osd journal = /home/transoft/data/osd$id/journal filestore journal writeahead = true osd journal size = 10000 ; if 'btrfs devs' is not specified, you're responsible for ; setting up the 'osd data' dir. if it is not btrfs, things ; will behave up until you try to recover from a crash (which ; usually fine for basic testing). ; btrfs devs = /dev/sdx [osd1] host = ubuntu-osd1 osd data = /opt/ceph/data/osd$id osd journal = /home/transoft/data/osd$id/journal filestore journal writeahead = true osd journal size = 10000 ;btrfs devs = /dev/sdy ;[osd2] ;host = zeta ;btrfs devs = /dev/sdx ;[osd3] ;host = eta ;btrfs devs = /dev/sdy -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html