Re: Unexpected disk write activity with btrfs OSDs

Erik Logtenberg <erik@xxxxxxxxxxxxx> · Fri, 19 Jun 2015 13:23:16 +0200

I believe this may be the same issue I reported some time ago, which is
as of yet unsolved.

https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg19770.html

I used strace to figure out that the OSD's were doing an incredible
amount of getxattr, setxattr and removexattr calls, for no apparent
reason. Do you see the same write pattern?

My OSD's are also btrfs-backed.

Kind regards,

Erik.

On 18-06-15 23:36, Lionel Bouton wrote:
> I just realized I forgot to add a proper context :
> 
> this is with Firefly 0.80.9 and the btrfs OSDs are running on kernel
> 4.0.5 (this was happening with previous kernel versions according to our
> monitoring history), xfs OSDs run on 4.0.5 or 3.18.9. There are 23 OSDs
> total and 2 of them are using btrfs.
> 
> On 06/18/15 23:28, Lionel Bouton wrote:
>> Hi,
>>
>> I've just noticed an odd behaviour with the btrfs OSDs. We monitor the
>> amount of disk writes on each device, our granularity is 10s (every 10s
>> the monitoring system collects the total amount of sector written and
>> write io performed since boot and computes both the B/s and IO/s).
>>
>> With only residual write activity on our storage network (~450kB/s total
>> for the whole Ceph cluster, which amounts to a theoretical ~120kB/s on
>> each OSD once replication, double writes due to journal and number of
>> OSD are factored in) :
>> - Disks with btrfs OSD have a spike of activity every 30s (2 intervals
>> of 10s with nearly 0 activity, one interval with a total amount of
>> writes of ~120MB). The averages are : 4MB/s, 100 IO/s.
>> - Disks with xfs OSD (with journal on a separate partition but same
>> disk) don't have these spikes of activity and the averages are far lower
>> : 160kB/s and 5 IO/s. This is not far off what is expected from the
>> whole cluster write activity.
>>
>> There's a setting of 30s on our platform :
>> filestore max sync interval
>>
>> I changed it to 60s with
>> ceph tell osd.* injectargs "'--filestore-max-sync-interval 60'"
>> and the amount of writes was lowered to ~2.5MB/s.
>>
>> I changed it to 5s (the default) with
>> ceph tell osd.* injectargs "'--filestore-max-sync-interval 5'"
>> the amount of writes to the device rose to an average of 10MB/s (and
>> given our sampling interval of 10s appeared constant).
>>
>> During these tests the activity on disks hosting XFS OSDs didn't change
>> much.
>>
>> So it seems filestore syncs generate far more activity on btrfs OSDs
>> compared to XFS OSDs (journal activity included for both).
>>
>> Note that autodefrag is disabled on our btrfs OSDs. We use our own
>> scheduler which in the case of our OSD limits the amount of defragmented
>> data to ~10MB per minute in the worst case and usually (during low write
>> activity which was the case here) triggers a single file defragmentation
>> every 2 minutes (which amounts to a 4MB write as we only host RBDs with
>> the default order value). So defragmentation shouldn't be an issue here.
>>
>> This doesn't seem to generate too much stress when filestore max sync
>> interval is 30s (our btrfs OSDs are faster than xfs OSDs with the same
>> amount of data according to apply latencies) but at 5s the btrfs OSDs
>> are far slower than our xfs OSDs with 10x the average apply latency (we
>> didn't let this continue more than 10 minutes as it began to make some
>> VMs wait for IOs too much).
>>
>> Does anyone know if this is normal and why it is happening?
>>
>> Best regards,
>>
>> Lionel
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com