Re: Btrfs slowdown

Christian Brunner <chb@xxxxxx> · Thu, 28 Jul 2011 17:10:27 +0200

2011/7/28 Marcus Sorensen <shadowsor@xxxxxxxxx>:
> Christian,
>
> Have you checked up on the disks themselves and hardware? High
> utilization can mean that the i/o load has increased, but it can also
> mean that the i/o capacity has decreased.  Your traces seem to
> indicate that a good portion of the time is being spent on commits,
> that could be waiting on disk. That "wait_for_commit" looks to
> basically just spin waiting for the commit to complete, and at least
> one thing that calls it raises a BUG_ON, not sure if it's one you've
> seen even on 2.6.38.
>
> There could be all sorts of performance related reasons that aren't
> specific to btrfs or ceph, on our various systems we've seen things
> like the raid card module being upgraded in newer kernels and suddenly
> our disks start to go into sleep mode after a bit, dirty_ratio causing
> multiple gigs of memory to sync because its not optimized for the
> workload, external SAS enclosures stop communicating a few days after
> reboot (but the disks keep working with sporadic issues), things like
> patrol read hitting a bad sector on a disk, causing it to go into
> enhanced error recovery and stop responding, etc.

I' fairly confident that the hardware is ok. We see the problem on
four machines. It could be a problem with the hpsa driver/firmware,
but we haven't seen the behavior with 2.6.38 and the changes in the
hpsa driver are not that big.

> Maybe you have already tried these things. It's where I would start
> anyway. Looking at /proc/meminfo, dirty, writeback, swap, etc both
> while the system is functioning desirably and when it's misbehaving.
> Looking at anything else that might be in D state. Looking at not just
> disk util, but the workload causing it (e.g. Was I doing 300 iops
> previously with an average size of 64k, and now I'm only managing 50
> iops at 64k before the disk util reports 100%?) Testing the system in
> a filesystem-agnostic manner, for example when performance is bad
> through btrfs, is performance the same as you got on fresh boot when
> testing iops on /dev/sdb or whatever? You're not by chance swapping
> after a bit of uptime on any volume that's shared with the underlying
> disks that make up your osd, obfuscated by a hardware raid? I didn't
> see the kernel warning you're referring to, just the ixgbe malloc
> failure you mentioned the other day.

I've looked at most of this. What makes me point to btrfs, is that the
problem goes away when I reboot on server in our cluster, but persists
on the other systems. So it can't be related to the number of requests
that come in.

> I do not mean to presume that you have not looked at these things
> already. I am not very knowledgeable in btrfs specifically, but I
> would expect any degradation in performance over time to be due to
> what's on disk (lots of small files, fragmented, etc). This is
> obviously not the case in this situation since a reboot recovers the
> performance. I suppose it could also be a memory leak or something
> similar, but you should be able to detect something like that by
> monitoring your memory situation, /proc/slabinfo etc.

It could be related to a memory leak. The machine has a lot RAM (24
GB), but we have seen page allocation failures in the ixgbe driver,
when we are using jumbo frames.

> Just my thoughts, good luck on this. I am currently running 2.6.39.3
> (btrfs) on the 7 node cluster I put together, but I just built it and
> am comparing between various configs. It will be awhile before it is
> under load for several days straight.

Thanks!

When I look at the latencytop results, there is a high latency when
calling "btrfs_commit_transaction_async". Isn't "async" supposed to
return immediately?

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html