I believe that ‘filestore xattr use omap’ is no longer used in Ceph – can anybody confirm this?
I could not find any usage in the Ceph source code except that the value is set in some of the test software…
Paul
From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Tom Christensen <pavera@xxxxxxxxx>
Date: Monday, 30 November 2015 at 23:20 To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx> Subject: Re: [ceph-users] Flapping OSDs, Large meta directories in OSDs What counts as ancient? Concurrent to our hammer upgrade we went from 3.16->3.19 on ubuntu 14.04. We are looking to revert to the 3.16 kernel we'd been running because we're also seeing an intermittent (its happened twice in 2 weeks) massive
load spike that completely hangs the osd node (we're talking about load averages that hit 20k+ before the box becomes completely unresponsive). We saw a similar behavior on a 3.13 kernel, which resolved by moving to the 3.16 kernel we had before. I'll try
to catch one with debug_ms=1 and see if I can see it we're hitting a similar hang.
To your comment about omap, we do have filestore xattr use omap = true in our conf... which we believe was placed there by ceph-deploy (which we used to deploy this cluster). We are on xfs, but we do take tons of RBD snapshots. If either of these use
cases will cause lots of osd map size then, we may just be exceeding the limits of the number of rbd snapshots ceph can handle (we take about 4-5000/day, 1 per RBD in the cluster)
An interesting note, we had an OSD flap earlier this morning, and when it did, immediately after it came back I checked its meta directory size with du -sh, this returned immediately, and showed a size of 107GB. The fact that it returned immediately indicated
to me that something had just recently read through that whole directory and it was all cached in the FS cache. Normally a du -sh on the meta directory takes a good 5 minutes to return. Anyway, since it dropped this morning its meta directory size continues
to shrink and is down to 93GB. So it feels like something happens that makes the OSD read all its historical maps which results in the OSD hanging cause there are a ton of them, and then it wakes up and realizes it can delete a bunch of them...
On Mon, Nov 30, 2015 at 2:11 PM, Dan van der Ster
<dvanders@xxxxxxxxx> wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com