As you might know, I have been seeing btrfs slowdowns in our ceph cluster for quite some time. Even with the latest btrfs code for 3.3 I'm still seeing these problems. To make things reproducible, I've now written a small test, that imitates ceph's behavior: On a freshly created btrfs filesystem (2 TB size, mounted with "noatime,nodiratime,compress=lzo,space_cache,inode_cache") I'm opening 100 files. After that I'm doing random writes on these files with a sync_file_range after each write (each write has a size of 100 bytes) and ioctl(BTRFS_IOC_SYNC) after every 100 writes. After approximately 20 minutes, write activity suddenly increases fourfold and the average request size decreases (see chart in the attachment). You can find IOstat output here: http://pastebin.com/Smbfg1aG I hope that you are able to trace down the problem with the test program in the attachment. Thanks, Christian
#define _GNU_SOURCE #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <fcntl.h> #include <unistd.h> #include <attr/xattr.h> #define FILE_COUNT 100 #define FILE_SIZE 4194304 #define STRING "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789" #define BTRFS_IOCTL_MAGIC 0x94 #define BTRFS_IOC_SYNC _IO(BTRFS_IOCTL_MAGIC, 8) int main(int argc, char *argv[]) { char *imgname = argv[1]; char *tempname; int fd[FILE_COUNT]; int ilen, i; ilen = strlen(imgname); tempname = malloc(ilen + 8); for(i=0; i < FILE_COUNT; i++) { snprintf(tempname, ilen + 8, "%s.%i", imgname, i); fd[i] = open(tempname, O_CREAT|O_RDWR); } i=0; while(1) { int start = rand() % FILE_SIZE; int file = rand() % FILE_COUNT; putc('.', stderr); lseek(fd[file], start, SEEK_SET); write(fd[file], STRING, 100); sync_file_range(fd[file], start, 100, 0x2); usleep(25000); i++; if (i == 100) { i=0; ioctl(fd[file], BTRFS_IOC_SYNC); } } }
Attachment:
btrfstest.png
Description: PNG image