Hi, while testing a setup with ceph and Radosgw for storing document files, we encountered some problems. We fill in files to RadosGW (tested both with S3 and Swift) with a little perl script to see how the cluster behaves, putting 100k+ files with different sizes(1kb-1mb) in the cluster (in parallel with 100-1000 threads). After some time the ceph status shows repeatadly following error messages: 2013-06-12 10:02:18.583317 7f0fa57ee700 0 -- 192.168.3.170:0/46368 >> 192.168.3.170:6789/0 pipe(0x7f0fa000ff30 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault 2013-06-12 10:02:24.584061 7f0fabf40700 0 -- 192.168.3.170:0/46368 >> 192.168.3.170:6789/0 pipe(0x7f0fa0006a50 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault ... That leaves the cluster sometimes in a state where we cannot write to it or delete objects anymore. After restarting the ceph service, we get this: HEALTH_WARN 475 pgs degraded; 475 pgs stuck unclean; recovery 7 7899/316778 degraded (24.591%) After some minutes up to one hour the health is back to "ok" and cluster operations can be made without problems. This problem occurred on a cluster we already run virtual machines on, the VMs where never affected. On a test cluster we have the same issues. Setup test cluster: Two servers running debian squeeze with kernel 3.6.7-amd64 and ceph Bobtail, each having two OSDs and one Mon. Two servers on debian for running RadosGW (VMs). Any idea what could be the cause of this problems? BR Paul _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com