Hi, VM died, but on root disk i found: kern.log: <5>1 2013-06-04T21:18:02.568823+02:00 vm-1 kernel - - - [ 220.717935] sd 2:0:0:0: Attached scsi generic sg0 type 0 <5>1 2013-06-04T21:18:02.568848+02:00 vm-1 kernel - - - [ 220.718231] sd 2:0:0:0: [sda] 1048576000 512-byte logical blocks: (536 GB/500 GiB) <5>1 2013-06-04T21:18:02.568848+02:00 vm-1 kernel - - - [ 220.718644] sd 2:0:0:0: [sda] Write Protect is off <7>1 2013-06-04T21:18:02.568848+02:00 vm-1 kernel - - - [ 220.718648] sd 2:0:0:0: [sda] Mode Sense: 63 00 00 08 <5>1 2013-06-04T21:18:02.568848+02:00 vm-1 kernel - - - [ 220.718831] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA <6>1 2013-06-04T21:18:02.572829+02:00 vm-1 kernel - - - [ 220.720405] sda: unknown partition table <5>1 2013-06-04T21:18:02.572850+02:00 vm-1 kernel - - - [ 220.721593] sd 2:0:0:0: [sda] Attached SCSI disk <5>1 2013-06-04T21:18:23.492939+02:00 vm-1 kernel - - - [ 241.642855] XFS (sda): Mounting Filesystem <6>1 2013-06-04T21:18:23.540894+02:00 vm-1 kernel - - - [ 241.688141] XFS (sda): Ending clean mount <4>1 2013-06-04T21:19:51.270529+02:00 vm-1 kernel - - - [ 329.413347] hrtimer: interrupt took 8993506 ns <4>1 2013-06-04T21:21:40.732930+02:00 vm-1 kernel - - - [ 438.880340] sd 2:0:0:0: [sda] ABORT operation started <4>1 2013-06-04T21:21:45.732920+02:00 vm-1 kernel - - - [ 443.880107] sd 2:0:0:0: ABORT operation timed-out. <4>1 2013-06-04T21:21:45.732981+02:00 vm-1 kernel - - - [ 443.880117] sd 2:0:0:0: [sda] ABORT operation started <4>1 2013-06-04T21:21:50.732890+02:00 vm-1 kernel - - - [ 448.880642] sd 2:0:0:0: ABORT operation timed-out. <4>1 2013-06-04T21:21:50.732956+02:00 vm-1 kernel - - - [ 448.880655] sd 2:0:0:0: [sda] ABORT operation started <4>1 2013-06-04T21:21:55.732930+02:00 vm-1 kernel - - - [ 453.880202] sd 2:0:0:0: ABORT operation timed-out. <4>1 2013-06-04T21:21:55.732992+02:00 vm-1 kernel - - - [ 453.880212] sd 2:0:0:0: [sda] ABORT operation started <4>1 2013-06-04T21:22:00.732916+02:00 vm-1 kernel - - - [ 458.880280] sd 2:0:0:0: ABORT operation timed-out. <4>1 2013-06-04T21:22:00.732979+02:00 vm-1 kernel - - - [ 458.880291] sd 2:0:0:0: [sda] ABORT operation started <4>1 2013-06-04T21:22:05.732910+02:00 vm-1 kernel - - - [ 463.880200] sd 2:0:0:0: ABORT operation timed-out. <4>1 2013-06-04T21:22:05.732975+02:00 vm-1 kernel - - - [ 463.880211] sd 2:0:0:0: [sda] ABORT operation started <4>1 2013-06-04T21:22:10.732928+02:00 vm-1 kernel - - - [ 468.881404] sd 2:0:0:0: ABORT operation timed-out. <4>1 2013-06-04T21:22:10.732989+02:00 vm-1 kernel - - - [ 468.881414] sd 2:0:0:0: [sda] ABORT operation started ceph -w health HEALTH_ERR 2 pgs inconsistent; 6 pgs peering; 1 pgs repair; 6 pgs stuck inactive; 11 pgs stuck unclean; 3 scrub errors monmap e9: 5 mons at {0=10.177.67.4:6782/0,1=10.177.67.5:6782/0,3=10.177.67.7:6782/0,4=10.177.67.8:6782/0,5=10.177.67.9:6782/0}, election epoch 2612, quorum 0,1,2,3,4 0,1,3,4,5 osdmap e12006: 156 osds: 156 up, 156 in pgmap v1120920: 18306 pgs: 5 active, 18293 active+clean, 6 peering, 1 active+clean+inconsistent, 1 active+clean+scrubbing+deep+inconsistent+repair; 1044 GB data, 4773 GB used, 38647 GB / 43420 GB avail mdsmap e1: 0/0/1 up 2013-06-04 21:22:58.459901 mon.0 [INF] pgmap v1120919: 18306 pgs: 5 active, 18293 active+clean, 6 peering, 2 active+clean+inconsistent; 1044 GB data, 4773 GB used, 38647 GB / 43420 GB avail 2013-06-04 21:22:59.483844 mon.0 [INF] pgmap v1120920: 18306 pgs: 5 active, 18293 active+clean, 6 peering, 1 active+clean+inconsistent, 1 active+clean+scrubbing+deep+inconsistent+repair; 1044 GB data, 4773 GB used, 38647 GB / 43420 GB avail 2013-06-04 21:22:54.835243 osd.91 [WRN] 5 slow requests, 1 included below; oldest blocked for > 4510.528973 secs 2013-06-04 21:22:54.835256 osd.91 [WRN] slow request 4510.528973 seconds old, received at 2013-06-04 20:07:44.306200: osd_op(client.12947699.0:7466 rb.0.c5895a.238e1f29.000000001d24 [delete] 3.695f3c2a e12006) v4 currently reached pg 2013-06-04 21:22:55.835495 osd.91 [WRN] 5 slow requests, 1 included below; oldest blocked for > 4511.529224 secs 2013-06-04 21:22:55.835500 osd.91 [WRN] slow request 4511.529224 seconds old, received at 2013-06-04 20:07:44.306200: osd_op(client.12947699.0:7466 rb.0.c5895a.238e1f29.000000001d24 [delete] 3.695f3c2a e12006) v4 currently reached pg 2013-06-04 21:22:56.835712 osd.91 [WRN] 5 slow requests, 1 included below; oldest blocked for > 4512.529440 secs 2013-06-04 21:22:56.835717 osd.91 [WRN] slow request 4512.529440 seconds old, received at 2013-06-04 20:07:44.306200: osd_op(client.12947699.0:7466 rb.0.c5895a.238e1f29.000000001d24 [delete] 3.695f3c2a e12006) v4 currently reached pg 2013-06-04 21:22:57.835956 osd.91 [WRN] 5 slow requests, 1 included below; oldest blocked for > 4513.529679 secs 2013-06-04 21:22:57.835961 osd.91 [WRN] slow request 4513.529679 seconds old, received at 2013-06-04 20:07:44.306200: osd_op(client.12947699.0:7466 rb.0.c5895a.238e1f29.000000001d24 [delete] 3.695f3c2a e12006) v4 currently reached pg 2013-06-04 21:22:58.836209 osd.91 [WRN] 5 slow requests, 1 included below; oldest blocked for > 4514.529939 secs 2013-06-04 21:22:58.836214 osd.91 [WRN] slow request 4514.529939 seconds old, received at 2013-06-04 20:07:44.306200: osd_op(client.12947699.0:7466 rb.0.c5895a.238e1f29.000000001d24 [delete] 3.695f3c2a e12006) v4 currently reached pg 2013-06-04 21:22:59.836438 osd.91 [WRN] 5 slow requests, 1 included below; oldest blocked for > 4515.530167 secs 2013-06-04 21:22:59.836444 osd.91 [WRN] slow request 4515.530167 seconds old, received at 2013-06-04 20:07:44.306200: osd_op(client.12947699.0:7466 rb.0.c5895a.238e1f29.000000001d24 [delete] 3.695f3c2a e12006) v4 currently reached pg Regards Dominik 2013/6/3 Gregory Farnum <greg@xxxxxxxxxxx>: > On Sunday, June 2, 2013, Dominik Mostowiec wrote: >> >> Hi, >> I try to start postgres cluster on VMs with second disk mounted from >> ceph (rbd - kvm). >> I started some writes (pgbench initialisation) on 8 VMs and VMs freez. >> Ceph reports slow request on 1 osd. I restarted this osd to remove >> slows and VMs hangs permanently. >> Is this a normal situation afer cluster problems? > > > Definitely not. Is your cluster reporting as healthy (what's "ceph -s" say)? > Can you get anything off your hung VMs (like dmesg output)? > -Greg > > > -- > Software Engineer #42 @ http://inktank.com | http://ceph.com -- Pozdrawiam Dominik _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com