Pankaj, Could be related to the new throttle parameter introduced in jewel. By default these throttles are off , you need to tweak it according to your setup. What is your journal size and fio block size ?
If it is default 5GB , with this rate (assuming 4K RW) you mentioned and considering 3X replication , it can fill up your journal and stall io within ~30 seconds or so. If you think this is what is happening in your system , you need to turn this throttle on (see https://github.com/ceph/ceph/blob/jewel/src/doc/dynamic-throttle.txt ) and also need to lower the
filestore_max_sync_interval to ~1 (or even lower). Since you are trying on SSD , I would also recommend to turn the following parameter on for the stable
performance out. filestore_odsync_write = true Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Garg, Pankaj Hi, I just installed jewel on a small cluster of 3 machines with 4 SSDs each. I created 8 RBD images, and use a single client, with 8 threads, to do random writes (using FIO with RBD engine) on the images ( 1 thread per image). The cluster has 3X replication and 10G cluster and client networks. FIO prints the aggregate IOPS every second for the cluster. Before Jewel, I get roughtly 10K IOPS. It was up and down, but still kept going. Now I see IOPS that go to 13-15K, but then it drops, and eventually drops to ZERO for several seconds, and then starts back up again. What am I missing? Thanks Pankaj |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com