Hello, I can see something similar on the machines I maintain, mostly single-disk setups with a 2.6.39 kernel: 1) Heavy and frequent disk thrashing, although less than 20% of RAM is used and no swap usage is reported. 2) During the disk thrashing, some processors (usually 2 or 3) spend 100% of their time busy waiting, according to htop. 3) Some userspace applications freeze for tens of seconds during the thrashing and busy waiting, sometimes even htop itself... The problem has only been observed on 64-bit multiprocessors (Core i7 laptop and Nehalem class server Xeons). A 32-bit multiprocessor (Intel Core Duo) and a 64-bit uniprocessor (Intel Core 2 Duo class Celeron) do not seem to have any issues. Furthermore, none of the machines had this problem with 2.6.38 and earlier kernels. Btrfs "just worked" before 2.6.39. I'll test 3.0 today to see whether some of these issues disappear. Neither ceph nor any other remote/distributed filesystem (not even NFS) runs on the machines. The second problem listed above looks like illegal blocking of a vital spinlock during a long disk operation, which freezes some kernel subsystems for an inordinate amount of time and causes a number of processors to wait actively for tens of seconds. (Needless to say that this is not acceptable on a laptop...) Web browsers (Firefox and Chromium) seem to trigger this issue slightly more often than other applications, but I have no detailed statistics to prove this. ;-) Two Core i7 class multiprocessors work 100% flawlessly with ext4, although their kernel configuration is otherwise identical to the machines that use Btrfs. Andrej
Hi, we are running a ceph cluster with btrfs as it's base filesystem (kernel 3.0). At the beginning everything worked very well, but after a few days (2-3) things are getting very slow. When I look at the object store servers I see heavy disk-i/o on the btrfs filesystems (disk utilization is between 60% and 100%). I also did some tracing on the Cepp-Object-Store-Daemon, but I'm quite certain, that the majority of the disk I/O is not caused by ceph or any other userland process. When reboot the system(s) the problems go away for another 2-3 days, but after that, it starts again. I'm not sure if the problem is related to the kernel warning I've reported last week. At least there is no temporal relationship between the warning and the slowdown. Any hints on how to trace this would be welcome. Thanks, Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
smime.p7s
Description: Elektronický podpis S/MIME