Re: Pause i/o from time to time

Mike Dawson <mike.dawson@xxxxxxxxxxxx> · Sun, 29 Dec 2013 03:58:29 -0500

What version of qemu do you have?

The issues I had were fixed once I upgraded qemu to >=1.4.2 which 
includes a critical rbd patch for asynchronous io from Josh Durgin.

Cheers,
Mike

On 12/28/2013 4:09 PM, Andrei Mikhailovsky wrote:

Hi guys,

Did anyone figure out what could be causing this problem and a workaround?

I've noticed a very annoying behaviour with my vms. It seems to happen
randomly about 5-10 times a day and the pauses last between 2-10
minutes. It happens across all vms on all host servers in my cluster. I
am running 0.67.4 on ubuntu 12.04 with 3.11 kernel from backports.

Initially i though that these pauses are caused by the scrubbing issue
reported by Mike, however, I've also noticed the stalls when the cluster
is not scrubbing. Both of my osd servers are pretty idle (load around 1
to 2) with osds are less than 10% utilised.

Unlike Uwe's case, I am not using iscsi, but plain rbd with qemu and I
do not see any i/o errors in dmesg or kernel panics. the vms just freeze
and become unresponsive, so i can't ssh into it or do simple commands
like ls. VMs do respond to pings though.

Thanks

Andrei

------------------------------------------------------------------------
*From: *"Uwe Grohnwaldt" <uwe@xxxxxxxxxxxxx>
*To: *ceph-users@xxxxxxxxxxxxxx
*Sent: *Thursday, 24 October, 2013 8:31:42 AM
*Subject: *Re:  Pause i/o from time to time

Hello ceph-users,

we're hitting a similar problem last Thursday and today. We have a
cluster consisting of 6 storagenodes containing 70 osds (JBOD
configuration). We created several rbd devices and mapped them on
dedicated server and exporting them via targetcli. This iscsi target are
connected to Citrix XenServer 6.1 (with HF30) and XenServer 6.2 (HF4).

In the last time some disks died. After this some errors occured on this
dedicated iscsitarget:
Oct 23 15:19:42 targetcli01 kernel: [673836.709887] end_request: I/O
error, dev rbd4, sector 2034037064
Oct 23 15:19:42 targetcli01 kernel: [673836.713596]
test_bit(BIO_UPTODATE) failed for bio: ffff880127546c00, err: -6
Oct 23 15:19:43 targetcli01 kernel: [673837.497382] end_request: I/O
error, dev rbd4, sector 2034037064
Oct 23 15:19:43 targetcli01 kernel: [673837.501323]
test_bit(BIO_UPTODATE) failed for bio: ffff880124d933c0, err: -6

These errors go through up to the virtual machines and lead to readonly
filesystems. We could trigger this behavior with set one disk to out.

We are using Ubuntu 13.04 with latest stable ceph (ceph version 0.67.4
(ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)

Our ceph.conf is like this:

[global]
filestore_xattr_use_omap = true
mon_host = 10.200.20.1,10.200.20.2,10.200.20.3
osd_journal_size = 1024
public_network = 10.200.40.0/16
mon_initial_members = ceph-mon01, ceph-mon02, ceph-mon03
cluster_network = 10.210.40.0/16
auth_supported = none
fsid = 9283e647-2b57-4077-b427-0d3d656233b3

[osd]
osd_max_backfills = 4
osd_recovery_max_active = 1

[osd.0]
public_addr = 10.200.40.1
cluster_addr = 10.210.40.1
....
....

After the first outage we set osd_max_backfill to 8, after the second
one to 4 but it didn't help. It seems like it is the bug mentioned at
http://tracker.ceph.com/issues/6278 . The problem is, that this is a
production environment and the problems began after we moved several VMs
to it. In our test environment we can't reproduct it but we are working
on a larger testinstallation.

Does anybody have an idea how to investigate further without destroying
virtual machines? ;)

Sometimes these IO errors lead to kernel panics on the iscsi target
machine. The targetcli/lio config is a simple default config without any
tuning or big configurations.

Mit freundlichen Grüßen / Best Regards,
Uwe Grohnwaldt

----- Original Message -----
 > From: "Timofey" <timofey@xxxxxxxxx>
 > To: "Mike Dawson" <mike.dawson@xxxxxxxxxxxx>
 > Cc: ceph-users@xxxxxxxxxxxxxx
 > Sent: Dienstag, 17. September 2013 22:37:44
 > Subject: Re:  Pause i/o from time to time
 >
 > I have examined logs.
 > Yes, first time it can be scrubbing. It repaired some self.
 >
 > I had 2 servers before first problem: one dedicated for osd (osd.0),
 > and second - with osd and websites (osd.1).
 > After problem I add third server - dedicated for osd (osd.2) and call
 > ceph osd set out osd.1 for replace data.
 >
 > In ceph -s i saw normal replacing process and all work good about 5-7
 > hours.
 > Then I have many misdirected records (few hundreds per second):
 > osd.0 [WRN] client.359671  misdirected client.359671.1:220843 pg
 > 2.3ae744c0 to osd.0 not [2,0] in e1040/1040
 > and errors in i/o operations.
 >
 > Now I have about 20GB ceph logs with this errors. (I don't work with
 > cluster now - I copy out all data on hdd and work from hdd).
 >
 > Is any way have local software raid1 with ceph rbd and local image
 > (for work when ceph fail or work slow by any reason).
 > I tried mdadm but it work bad - server hang up every few hours.
 >
 > > You could be suffering from a known, but unfixed issue [1] where
 > > spindle contention from scrub and deep-scrub cause periodic stalls
 > > in RBD. You can try to disable scrub and deep-scrub with:
 > >
 > > # ceph osd set noscrub
 > > # ceph osd set nodeep-scrub
 > >
 > > If your problem stops, Issue #6278 is likely the cause. To
 > > re-enable scrub and deep-scrub:
 > >
 > > # ceph osd unset noscrub
 > > # ceph osd unset nodeep-scrub
 > >
 > > Because you seem to only have two OSDs, you may also be saturating
 > > your disks even without scrub or deep-scrub.
 > >
 > > http://tracker.ceph.com/issues/6278
 > >
 > > Cheers,
 > > Mike Dawson
 > >
 > >
 > > On 9/16/2013 12:30 PM, Timofey wrote:
 > >> I use ceph for HA-cluster.
 > >> Some time ceph rbd go to have pause in work (stop i/o operations).
 > >> Sometime it can be when one of OSD slow response to requests.
 > >> Sometime it can be my mistake (xfs_freeze -f for one of
 > >> OSD-drive).
 > >> I have 2 storage servers with one osd on each. This pauses can be
 > >> few minutes.
 > >>
 > >> 1. Is any settings for fast change primary osd if current osd work
 > >> bad (slow, don't response).
 > >> 2. Can I use ceph-rbd in software raid-array with local drive, for
 > >> use local drive instead of ceph if ceph cluster fail?
 > >> _______________________________________________
 > >> ceph-users mailing list
 > >> ceph-users@xxxxxxxxxxxxxx
 > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 > >>
 >
 > _______________________________________________
 > ceph-users mailing list
 > ceph-users@xxxxxxxxxxxxxx
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com