After running ceph on XFS for some time, I decided to try btrfs again. Performance with the current "for-linux-min" branch and big metadata is much better. The only problem (?) I'm still seeing is a warning that seems to occur from time to time: [87703.784552] ------------[ cut here ]------------ [87703.789759] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() [87703.799070] Hardware name: ProLiant DL180 G6 [87703.804024] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] [87703.828166] Pid: 929, comm: kworker/1:2 Tainted: P O 3.3.2-1.fits.1.el6.x86_64 #1 [87703.837513] Call Trace: [87703.840280] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [87703.847016] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [87703.853533] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] [87703.861541] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] [87703.868674] [<ffffffffa0351bcb>] btrfs_commit_transaction+0x5db/0xa50 [btrfs] [87703.876745] [<ffffffff810127a3>] ? __switch_to+0x153/0x440 [87703.882966] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 [87703.888997] [<ffffffffa0352040>] ? btrfs_commit_transaction+0xa50/0xa50 [btrfs] [87703.897271] [<ffffffffa035205f>] do_async_commit+0x1f/0x30 [btrfs] [87703.904262] [<ffffffff81068949>] process_one_work+0x129/0x450 [87703.910777] [<ffffffff8106b7eb>] worker_thread+0x17b/0x3c0 [87703.916991] [<ffffffff8106b670>] ? manage_workers+0x220/0x220 [87703.923504] [<ffffffff810703fe>] kthread+0x9e/0xb0 [87703.928952] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 [87703.935555] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 [87703.943323] [<ffffffff8158c220>] ? gs_change+0x13/0x13 [87703.949149] ---[ end trace b8c31966cca731fa ]--- [91128.812399] ------------[ cut here ]------------ [91128.817576] WARNING: at fs/btrfs/inode.c:2103 btrfs_orphan_commit_root+0xf6/0x100 [btrfs]() [91128.826930] Hardware name: ProLiant DL180 G6 [91128.831897] Modules linked in: btrfs zlib_deflate libcrc32c xfs exportfs sunrpc bonding ipv6 sg serio_raw pcspkr iTCO_wdt iTCO_vendor_support i7core_edac edac_core ixgbe dca mdio iomemory_vsl(PO) hpsa squashfs [last unloaded: scsi_wait_scan] [91128.856086] Pid: 6806, comm: btrfs-transacti Tainted: P W O 3.3.2-1.fits.1.el6.x86_64 #1 [91128.865912] Call Trace: [91128.868670] [<ffffffff8104df6f>] warn_slowpath_common+0x7f/0xc0 [91128.875379] [<ffffffff8104dfca>] warn_slowpath_null+0x1a/0x20 [91128.881900] [<ffffffffa0355686>] btrfs_orphan_commit_root+0xf6/0x100 [btrfs] [91128.889894] [<ffffffffa0350a06>] commit_fs_roots+0xc6/0x1c0 [btrfs] [91128.897019] [<ffffffffa03a2b61>] ? btrfs_run_delayed_items+0xf1/0x160 [btrfs] [91128.905075] [<ffffffffa0351bcb>] btrfs_commit_transaction+0x5db/0xa50 [btrfs] [91128.913156] [<ffffffffa03524b2>] ? start_transaction+0x92/0x310 [btrfs] [91128.920643] [<ffffffff81070a90>] ? wake_up_bit+0x40/0x40 [91128.926667] [<ffffffffa034cfcb>] transaction_kthread+0x26b/0x2e0 [btrfs] [91128.934254] [<ffffffffa034cd60>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [91128.943671] [<ffffffffa034cd60>] ? btrfs_destroy_marked_extents.clone.0+0x1f0/0x1f0 [btrfs] [91128.953079] [<ffffffff810703fe>] kthread+0x9e/0xb0 [91128.958532] [<ffffffff8158c224>] kernel_thread_helper+0x4/0x10 [91128.965133] [<ffffffff81070360>] ? kthread_freezable_should_stop+0x70/0x70 [91128.972913] [<ffffffff8158c220>] ? gs_change+0x13/0x13 [91128.978826] ---[ end trace b8c31966cca731fb ]--- I'm able to reproduce this with ceph on a single server with 4 disks (4 filesystems/osds) and a small test program based on librbd. It is simply writing random bytes on a rbd volume (see attachment). Is this something I should care about? Any hint's on solving this would be appreciated. Thanks, Christian
#include <inttypes.h> #include <rbd/librbd.h> #include <stdio.h> #include <signal.h> int nr_writes=0; void alarm_handler(int sig) { fprintf(stderr, "Writes/sec: %i\n", nr_writes/10); nr_writes = 0; alarm(10); } int main(int argc, char *argv[]) { char *clientname; rados_t cluster; rados_ioctx_t io_ctx; rbd_image_t image; char *pool = "rbd"; char *imgname = argv[1]; if (rados_create(&cluster, NULL) < 0) { fprintf(stderr, "error initializing"); return 1; } rados_conf_read_file(cluster, NULL); if (rados_connect(cluster) < 0) { fprintf(stderr, "error connecting"); rados_shutdown(cluster); return 1; } if (rados_ioctx_create(cluster, pool, &io_ctx) < 0) { fprintf(stderr, "error opening pool %s", pool); rados_shutdown(cluster); return 1; } int r = rbd_open(io_ctx, imgname, &image, NULL); if (r < 0) { fprintf(stderr, "error reading header from %s", imgname); rados_ioctx_destroy(io_ctx); rados_shutdown(cluster); return 1; } alarm(10); (void) signal(SIGALRM, alarm_handler); while(1) { #define RAND_MAX 10485760 int start = rand(); rbd_write(image, start, 1, "a"); nr_writes++; } rados_ioctx_destroy(io_ctx); rados_shutdown(cluster); }