Re: Is there a setting on Ceph that we can use to fix the minimum read size?

Thomas Bennett <thomas@xxxxxxxxx> · Fri, 2 Dec 2016 15:17:57 +0200

Hi Steve and Kate,
Again - thanks again for the great suggestions.

Increasing the allocsize did not help us in the situation relating to my current testing (poor read performance). However, allocsize is a great for parameter for overall performance tuning and I intend to use it. :)

After discussion with colleagues and reading this article - ubuntu drive io scheduler, I decided to try out the cfq io schedular - ubuntu now defaults to deadline.

This made a significant difference - it actually double the overall read performance.

I suggest anyone using ubuntu 14.04 or higher and high density osd nodes (we have 48 osds per osd node) might like to test out cfq. It's also a pretty easy test to perform :) and can be done on the fly.

Cheers,
Tom

On Wed, Nov 30, 2016 at 5:50 PM, Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> wrote:

We’re using Ubuntu 14.04 on x86_64. We just added ‘osd mount options xfs = rw,noatime,inode64,allocsize=1m’ to the [osd] section of our ceph.conf so XFS allocates
 1M blocks for new files. That only affected new files, so manual defragmentation was still necessary to clean up older data, but once that was done everything got better and stayed better.

You can use the xfs_db command to check fragmentation on an XFS volume and xfs_fsr to perform a defragmentation. The defragmentation can run on a mounted filesystem
 too, so you don’t even have to rely on Ceph to avoid downtime. I probably wouldn’t run it everywhere at once though for performance reasons. A single OSD at a time would be ideal, but that’s a matter of preference.

From: ceph-users [mailto:ceph-users-bounces@lists.ceph.com]
On Behalf Of Thomas Bennett

Sent: Wednesday, November 30, 2016 5:58 AM

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re:  Is there a setting on Ceph that we can use to fix the minimum read size?

Hi Kate and Steve,

Thanks for the replies. Always good to hear back from a community :)

I'm using Linux on x86_64 architecture and the block size is limited to the page size which is 4k. So it looks like I'm hitting hard limits in any changes. to increase the block size.

I found this out by running the following command:

$ mkfs.xfs -f -b size=8192 /dev/sda1

$ mount -v /dev/sda1 /tmp/disk/

mount: Function not implemented
#huh???

Checking out the man page:

$ man mkfs.xfs

 -b block_size_options

      ... XFS  on  Linux  currently  only  supports pagesize or smaller blocks.

I'm hesitant to implement btrfs as its still experimental and ext4 seems to have the same current limitation.

Our current approach is to exclude the hard drive that we're getting the poor read rates from our procurement process, but it would still be nice to find out how much control we have over how ceph-osd  daemons read from the drives. I may
 attempts a strace on an osd daemon as we read to see what the actual read request size is being asked to the kernel.

Cheers,

Tom

On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> wrote:

We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M blocks) due to massive
 fragmentation in our filestores a while back. We were having to defrag all the time and cluster performance was noticeably degraded. We also create and delete lots of RBD snapshots on a daily basis, so that likely contributed to the fragmentation as well.
 It’s been MUCH better since we switched XFS to use 1M allocations. Virtually no fragmentation and performance is consistently good.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com