On Wed, Jun 17, 2015 at 11:48 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote: > 1) Flags available in ceph osd set are > > pause|noup|nodown|noout|noin|nobackfill|norecover|noscrub|nodeep-scrub|notieragent > > I know or can guess most of them (the docs are a “bit” lacking) > > But with "ceph osd set nodown” I have no idea what it should be used for - > to keep hammering a faulty OSD? This is a cluster recovery tool you don't want to mess with. There are certain cases in large clusters where it can/has previously been able to get into a death spiral: high load causes OSDs to get marked down by peers, which causes more work for the peers, which increases the load, which causes them to get marked down, which... If you set nodown because you are confident the slowness is merely a load issue and not caused by OSDs actually breaking, you prevent that incorrect mark down from generating more work for the system. > > 2) looking through the docs there I found reference to "ceph osd > cluster_snap” > http://ceph.com/docs/v0.67.9/rados/operations/control/ > > what does it do? how does that work? does it really work? ;-) I got a few > hits on google which suggest it might not be something that really works, > but looks like something we could certainly use You can look at http://www.spinics.net/lists/ceph-devel/msg16241.html and its follow-ups for some discussion of cluster snap. You generally shouldn't use it — I think it only works on btrfs, it doesn't do anything for the monitors, and it's not something we test. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com