On Mon, Jan 27, 2014 at 9:05 PM, Stuart Longland <stuartl@xxxxxxxxxx> wrote: > On 25/01/14 16:41, Stuart Longland wrote: >> Hi Gregory, >> On 24/01/14 12:20, Gregory Farnum wrote: >>> Did the cluster actually detect the node as down? (You could check >>> this by looking at the ceph -w output or similar when running the >>> test.) If it was detected as down and the VM continued to block >>> (modulo maybe a little time for the client to decide its monitor was >>> down; I forget what the timeouts are there), that would be odd. >> >> I shall give that a command a try next time I get near the cluster >> (Tuesday). (I could do it today I guess, but I can't remotely power >> nodes back on, or hard-power them off from home.) > > Okay, I did some further tests today. In addition to the Windows 2008R2 > VM, I also started pummelling it with my own laptop (2.6GHz Core i5 > 3220M; 8GB RAM) which runs Gentoo Linux AMD64 and kernel 3.12.4. > > ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) was > installed from Gentoo's repository. > > I mapped a 20GB RBD using `rbd map`, formatted it XFS, then started > pummeling that with my gigabit link (which passes through a couple of > shared VLAN trunks), various disk stress testers and dd. > > Whilst that was proceeding, I then wandered to the server rack and > started fiddling. > > Before simulating outages, I was getting write speeds between the > 74MB/sec and 145MB/sec according to dbench. dd was getting about > 15.1MB/sec writing 1GB of random data. > > With a bash script running dd in a loop, and also running bonnie++ to > really push things, I started playing with the nodes, rebooting some, > powering off others. > > It seems there's a limit to how often you can power things off, even if > you wait for the cluster health to recover before proceeding. > Eventually the client (kernel or userspace) gets fed up, as seen in the > attached log. > > At present, `ceph -s` reports: >> HEALTH_WARN clock skew detected on mon.2 >> cluster b9b2ed48-e249-48ee-8e76-86493c2cc849 >> health HEALTH_WARN clock skew detected on mon.2 >> monmap e1: 3 mons at {0=10.87.160.224:6789/0,1=10.87.160.225:6789/0,2=10.87.160.226:6789/0}, election epoch 42, qu >> orum 0,1,2 0,1,2 >> osdmap e174: 6 osds: 6 up, 6 in >> pgmap v45386: 800 pgs, 4 pools, 398 GB data, 102026 objects >> 1195 GB used, 15563 GB / 16758 GB avail >> 800 active+clean >> > > and out of `ceph -w` I get: >> 6758 GB avail; 1130 B/s wr, 0 op/s >> 2014-01-28 14:49:20.812284 mon.0 [INF] pgmap v45379: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 1126 B/s wr, 0 op/s >> 2014-01-28 14:49:34.225852 mon.0 [INF] pgmap v45380: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 71 B/s wr, 0 op/s >> 2014-01-28 14:49:48.056665 mon.0 [INF] pgmap v45381: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail >> 2014-01-28 14:49:49.065547 mon.0 [INF] pgmap v45382: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail >> 2014-01-28 14:49:50.074878 mon.0 [INF] pgmap v45383: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 16270 B/s wr, 0 op/s >> 2014-01-28 14:49:51.083527 mon.0 [INF] pgmap v45384: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 16742 B/s wr, 0 op/s >> 2014-01-28 14:50:10.437994 mon.0 [WRN] mon.2 10.87.160.226:6789/0 clock skew 4.05188s > max 0.05s >> 2014-01-28 14:50:19.813536 mon.0 [INF] pgmap v45385: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 1140 B/s wr, 0 op/s >> 2014-01-28 14:50:20.818168 mon.0 [INF] pgmap v45386: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 1136 B/s wr, 0 op/s >> 2014-01-28 14:50:49.816479 mon.0 [INF] pgmap v45387: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 1130 B/s wr, 0 op/s >> 2014-01-28 14:50:50.825369 mon.0 [INF] pgmap v45388: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 1126 B/s wr, 0 op/s >> 2014-01-28 14:51:19.819779 mon.0 [INF] pgmap v45389: 800 pgs: 800 active+clean; 398 GB data, 1195 GB used, 15563 GB / 16758 GB avail; 1130 B/s wr, 0 op/s > > I do note ntp doesn't seem to be doing its job, but that's a side issue. Actually, that could be it. If you take down one of the monitors and the other two have enough of a time gap that they won't talk to each other, your cluster won't be able to make any progress. The OSDs don't much care, but your monitor nodes need to have a well-synced clock. And in your trace everything got wedged for so long the system just gave up; that's probably a result of the cluster having data it couldn't write to for too long. (Like I said before, you should make sure your CRUSH map and rules look right.) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com