Re: Btrfs slowdown

Chris Mason <chris.mason@xxxxxxxxxx> · Mon, 25 Jul 2011 15:52:50 -0400

Excerpts from Christian Brunner's message of 2011-07-25 03:54:47 -0400:
> Hi,
> 
> we are running a ceph cluster with btrfs as it's base filesystem
> (kernel 3.0). At the beginning everything worked very well, but after
> a few days (2-3) things are getting very slow.
> 
> When I look at the object store servers I see heavy disk-i/o on the
> btrfs filesystems (disk utilization is between 60% and 100%). I also
> did some tracing on the Cepp-Object-Store-Daemon, but I'm quite
> certain, that the majority of the disk I/O is not caused by ceph or
> any other userland process.
> 
> When reboot the system(s) the problems go away for another 2-3 days,
> but after that, it starts again. I'm not sure if the problem is
> related to the kernel warning I've reported last week. At least there
> is no temporal relationship between the warning and the slowdown.
> 
> Any hints on how to trace this would be welcome.

The easiest way to trace this is with latencytop.

Apply this patch:

http://oss.oracle.com/~mason/latencytop.patch

And then use latencytop -c for a few minutes while the system is slow.
Send the output here and hopefully we'll be able to figure it out.

-chris
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html