Disregard the last msg. Still getting long 0 IOPS periods. From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Garg, Pankaj Something in this section is causing all the 0 IOPS issue. Have not been able to nail down it yet. (I did comment out the filestore_max_inline_xattr_size entries, and problem still exists). If I take out the whole [osd] section, I was able to get rid of IOPS staying at 0 for long periods of time. Performance is still not where I would expect. [osd] osd_enable_op_tracker = false osd_op_num_shards = 2 filestore_wbthrottle_enable = false filestore_max_sync_interval = 1 filestore_odsync_write = true #filestore_max_inline_xattr_size = 254 #filestore_max_inline_xattrs = 6 filestore_queue_committing_max_bytes = 1048576000 filestore_queue_committing_max_ops = 5000 filestore_queue_max_bytes = 1048576000 filestore_queue_max_ops = 500 journal_max_write_bytes = 1048576000 journal_max_write_entries = 1000 journal_queue_max_bytes = 1048576000 journal_queue_max_ops = 3000 filestore_fd_cache_shards = 32 filestore_fd_cache_size = 64 From: Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
I am not sure whether you need to set the following. What’s the point of reducing inline xattr stuff ? I forgot the calculation but lower values could redirect your xattrs to omap. Better comment those out. filestore_max_inline_xattr_size = 254 filestore_max_inline_xattrs = 6 We could do some improvement on some of the params but nothing it seems responsible for the behavior you are seeing. Could you run iotop and see if any process (like xfsaild) is doing io on the drives during that time ? Thanks & Regards Somnath From: Garg, Pankaj [mailto:Pankaj.Garg@xxxxxxxxxx]
I agree, but I’m dealing with something else out here with this setup. I just ran a test, and within 3 seconds my IOPS went to 0, and stayed there for 90 seconds….then started and within seconds again went to 0. This doesn’t seem normal at all. Here is my ceph.conf: [global] fsid = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx public_network = xxxxxxxxxxxxxxxxxxxxxxxx cluster_network = xxxxxxxxxxxxxxxxxxxxxxxxxxxx mon_initial_members = ceph1 mon_host = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_mkfs_options = -f -i size=2048 -n size=64k osd_mount_options_xfs = inode64,noatime,logbsize=256k filestore_merge_threshold = 40 filestore_split_multiple = 8 osd_op_threads = 12 osd_pool_default_size = 2 mon_pg_warn_max_object_skew = 100000 mon_pg_warn_min_per_osd = 0 mon_pg_warn_max_per_osd = 32768 filestore_op_threads = 6 [osd] osd_enable_op_tracker = false osd_op_num_shards = 2 filestore_wbthrottle_enable = false filestore_max_sync_interval = 1 filestore_odsync_write = true filestore_max_inline_xattr_size = 254 filestore_max_inline_xattrs = 6 filestore_queue_committing_max_bytes = 1048576000 filestore_queue_committing_max_ops = 5000 filestore_queue_max_bytes = 1048576000 filestore_queue_max_ops = 500 journal_max_write_bytes = 1048576000 journal_max_write_entries = 1000 journal_queue_max_bytes = 1048576000 journal_queue_max_ops = 3000 filestore_fd_cache_shards = 32 filestore_fd_cache_size = 64 From: Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
You should do that first to get a stable performance out with filestore. 1M seq write for the entire image should be sufficient to precondition it. From: Garg, Pankaj [mailto:Pankaj.Garg@xxxxxxxxxx]
No I have not. From: Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
In fact, I was wrong , I missed you are running with 12 OSDs (considering one OSD per SSD). In that case, it will take ~250 second to fill up the journal. Have you preconditioned the entire image with bigger block say 1M before doing any real test ? From: Garg, Pankaj [mailto:Pankaj.Garg@xxxxxxxxxx]
Thanks Somnath. I will try all these, but I think there is something else going on too.
Firstly my test reaches 0 IOPS within 10 seconds sometimes. Secondly, when I’m at 0 IOPS, I see NO disk activity on IOSTAT and no CPU activity either. This part is strange. Thanks Pankaj From: Somnath Roy [mailto:Somnath.Roy@xxxxxxxxxxx]
Also increase the following.. filestore_op_threads From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Somnath Roy Pankaj, Could be related to the new throttle parameter introduced in jewel. By default these throttles are off , you need to tweak it according to your setup. What is your journal size and fio block size ?
If it is default 5GB , with this rate (assuming 4K RW) you mentioned and considering 3X replication , it can fill up your journal and stall io within ~30 seconds or so. If you think this is what is happening in your system , you need to turn this throttle on (see
https://github.com/ceph/ceph/blob/jewel/src/doc/dynamic-throttle.txt ) and also need to lower the
filestore_max_sync_interval to ~1 (or even lower). Since you are trying on SSD , I would also recommend to turn the following parameter on for the stable performance out. filestore_odsync_write = true Thanks & Regards Somnath From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx]
On Behalf Of Garg, Pankaj Hi, I just installed jewel on a small cluster of 3 machines with 4 SSDs each. I created 8 RBD images, and use a single client, with 8 threads, to do random writes (using FIO with RBD engine) on the images ( 1 thread per image). The cluster has 3X replication and 10G cluster and client networks. FIO prints the aggregate IOPS every second for the cluster. Before Jewel, I get roughtly 10K IOPS. It was up and down, but still kept going. Now I see IOPS that go to 13-15K, but then it drops, and eventually drops to ZERO for several seconds, and then starts back up again. What am I missing? Thanks Pankaj PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message
is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please
notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com