Re: Ceph OSD very slow startup

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 20 Oct 2014 11:04:20 -0700



On Mon, Oct 20, 2014 at 8:25 AM, Lionel Bouton <lionel+ceph@xxxxxxxxxxx> wrote:
> Hi,
>
> More information on our Btrfs tests.
>
> Le 14/10/2014 19:53, Lionel Bouton a écrit :
>
>
>
> Current plan: wait at least a week to study 3.17.0 behavior and upgrade the
> 3.12.21 nodes to 3.17.0 if all goes well.
>
>
> 3.17.0 and 3.17.1 have a bug which remounts Btrfs filesystems read-only (no
> corruption but OSD goes down) on some access patterns with snapshots:
> https://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg36483.html
>
> The bug may be present in earlier kernels (at least the 3.16.4 code in
> fs/btrfs/qgroup.c doesn't handle the case differently than 3.17.0 and
> 3.17.1) but seems at least less likely to show up (never saw it with 3.16.4
> in several weeks but it happened with 3.17.1 three times in just a few
> hours). As far as I can tell from its Changelog, 3.17.1 didn't patch any
> vfs/btrfs path vs 3.17.0 so I assume 3.17.0 has the same behaviour.
>
> I switched all servers to 3.16.4 which I had previously tested without any
> problem.
>
> The performance problem is still there with 3.16.4. In fact one of the 2
> large OSD was so slow it was repeatedly marked out and generated lots of
> latencies when in. I just had to remove it: when this OSD is shut down with
> noout to avoid backfills slowing down the storage network, latencies are
> back to normal. I chose to reformat this one with XFS.
>
> The other "big" node has a nearly perfectly identical system (same hardware,
> same software configuration, same logical volume configuration, same weight
> in the crush map, comparable disk usage in the OSD fs, ...) but is behaving
> itself (maybe slower than our smaller XFS and Btrfs OSD, but usable). The
> only notable difference is that it was formatted more recently. So the
> performance problem might be linked to the cumulative amount of data access
> to the OSD over time.

Yeah; we've seen this before and it appears to be related to our
aggressive use of btrfs snapshots; it seems that btrfs doesn't defrag
well under our use case. The btrfs developers make sporadic concerted
efforts to improve things (and succeed!), but it apparently still
hasn't gotten enough better yet. :(
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com