On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell <rlrevell@xxxxxxxxx> wrote: > On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> Has the OSD actually been detected as down yet? >> > > I believe it has, however I can't directly check because "ceph health" > starts to hang when I down the second node. Oh. You need to keep a quorum of your monitors running (just the monitor processes, not of everything in the system) or nothing at all is going to work. That's how we prevent split brain issues. > >> >> You'll also need to set that min size on your existing pools ("ceph >> osd pool <pool> set min_size 1" or similar) to change their behavior; >> the config option only takes effect for newly-created pools. (Thus the >> "default".) > > > I've done this, however the behavior is the same: > > $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do ceph osd > pool set $f min_size 1; done > set pool 0 min_size to 1 > set pool 1 min_size to 1 > set pool 2 min_size to 1 > set pool 3 min_size to 1 > set pool 4 min_size to 1 > set pool 5 min_size to 1 > set pool 6 min_size to 1 > set pool 7 min_size to 1 > > $ ceph -w > cluster db460aa2-5129-4aaa-8b2e-43eac727124e > health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2 > monmap e3: 3 mons at > {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/0,ceph-node-3=192.168.122.141:6789/0}, > election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2 > mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active} > osdmap e362: 3 osds: 2 up, 2 in > pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects > 25329 MB used, 12649 MB / 40059 MB avail > 840 active+clean > > 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840 > active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail > 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840 > active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail; 0 B/s > rd, 260 kB/s wr, 13 op/s > 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840 > active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail; 0 B/s > rd, 943 kB/s wr, 38 op/s > 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840 > active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail; 0 B/s > rd, 10699 kB/s wr, 621 op/s > > <this is where i kill the second OSD> > > 2015-03-26 17:26:26.778461 7f4ebeffd700 0 monclient: hunting for new mon > 2015-03-26 17:26:30.701099 7f4ec45f5700 0 -- 192.168.122.111:0/1007741 >> > 192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f4ec0023490).fault > 2015-03-26 17:26:42.701154 7f4ec44f4700 0 -- 192.168.122.111:0/1007741 >> > 192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7f4ec0025440).fault > > And all writes block until I bring back an OSD. > > Lee _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com