Hi, all I have some problem after availability test Setup: Linux kernel: 3.2.0 OS: Ubuntu 12.04 Storage server : 11 HDD (each storage server has 11 osd, 7200 rpm, 1T) + 10GbE NIC RAID card: LSI MegaRAID SAS 9260-4i For every HDD: RAID0, Write Policy: Write Back with BBU, Read Policy: ReadAhead, IO Policy: Direct Storage server number : 2 Ceph version : 0.48.2 Replicas : 2 Monitor number:3 We have two storage server as a cluter, then use ceph client create 1T RBD image for testing, the client also has 10GbE NIC , Linux kernel 3.2.0 , Ubuntu 12.04 We also use FIO to produce workload fio command: [Sequencial Read] fio --iodepth = 32 --numjobs=1 --runtime=120 --bs = 65536 --rw = read --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10 [Sequencial Write] fio --iodepth = 32 --numjobs=1 --runtime=120 --bs = 65536 --rw = write --ioengine=libaio --group_reporting --direct=1 --eta=always --ramp_time=10 --thinktime=10 Now I want observe to ceph state when one storage server is crash, so I turn off one storage server networking. We expect that data write and data read operation can be quickly resume or even not be suspended in ceph recovering time, but the experimental results show the data write and data read operation will pause for about 20~30 seconds in ceph recovering time. My question is: 1.The state of I/O pause is normal when ceph recovering ? 2.The pause time of I/O that can not be avoided when ceph recovering ? 3.How to reduce the I/O pause time ? Thanks!! -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html