Re: snap_trimming + backfilling is inefficient with many purged_snaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Florian,

September 21 2014 3:33 PM, "Florian Haas" <florian@xxxxxxxxxxx> wrote: 
> That said, I'm not sure that wip-9487-dumpling is the final fix to the
> issue. On the system where I am seeing the issue, even with the fix
> deployed, osd's still not only go crazy snap trimming (which by itself
> would be understandable, as the system has indeed recently had
> thousands of snapshots removed), but they also still produce the
> previously seen ENOENT messages indicating they're trying to trim
> snaps that aren't there.
> 

You should be able to tell exactly how many snaps need to be trimmed. Check the current purged_snaps with

ceph pg x.y query

and also check the snap_trimq from debug_osd=10. The problem fixed in wip-9487 is the (mis)communication of purged_snaps to a new OSD. But if in your cluster purged_snaps is "correct" (which it should be after the fix from Sage), and it still has lots of snaps to trim, then I believe the only thing to do is let those snaps all get trimmed. (my other patch linked sometime earlier in this thread might help by breaking up all that trimming work into smaller pieces, but that was never tested).

Entering the realm of speculation, I wonder if your OSDs are getting interrupted, marked down, out, or crashing before they have the opportunity to persist purged_snaps? purged_snaps is updated in ReplicatedPG::WaitingOnReplicas::react, but if the primary is too busy to actually send that transaction to its peers, so then eventually it or the new primary needs to start again, and no progress is ever made. If this is what is happening on your cluster, then again, perhaps my osd_snap_trim_max patch could be a solution.

Cheers, Dan

> That system, however, has PGs marked as recovering, not backfilling as
> in Dan's system. Not sure if wip-9487 falls short of fixing the issue
> at its root. Sage, whenever you have time, would you mind commenting?
> 
> Cheers,
> Florian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux