Re: Pause i/o from time to time

Uwe Grohnwaldt <uwe@xxxxxxxxxxxxx> · Thu, 24 Oct 2013 09:31:42 +0200 (CEST)

Hello ceph-users,

we're hitting a similar problem last Thursday and today. We have a cluster consisting of 6 storagenodes containing 70 osds (JBOD configuration). We created several rbd devices and mapped them on dedicated server and exporting them via targetcli. This iscsi target are connected to Citrix XenServer 6.1 (with HF30) and XenServer 6.2 (HF4).

In the last time some disks died. After this some errors occured on this dedicated iscsitarget:
Oct 23 15:19:42 targetcli01 kernel: [673836.709887] end_request: I/O error, dev rbd4, sector 2034037064
Oct 23 15:19:42 targetcli01 kernel: [673836.713596] test_bit(BIO_UPTODATE) failed for bio: ffff880127546c00, err: -6
Oct 23 15:19:43 targetcli01 kernel: [673837.497382] end_request: I/O error, dev rbd4, sector 2034037064
Oct 23 15:19:43 targetcli01 kernel: [673837.501323] test_bit(BIO_UPTODATE) failed for bio: ffff880124d933c0, err: -6

These errors go through up to the virtual machines and lead to readonly filesystems. We could trigger this behavior with set one disk to out.

We are using Ubuntu 13.04 with latest stable ceph (ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)

Our ceph.conf is like this:

[global]
filestore_xattr_use_omap = true
mon_host = 10.200.20.1,10.200.20.2,10.200.20.3
osd_journal_size = 1024
public_network = 10.200.40.0/16
mon_initial_members = ceph-mon01, ceph-mon02, ceph-mon03
cluster_network = 10.210.40.0/16
auth_supported = none
fsid = 9283e647-2b57-4077-b427-0d3d656233b3

[osd]
osd_max_backfills = 4
osd_recovery_max_active = 1

[osd.0]
public_addr = 10.200.40.1
cluster_addr = 10.210.40.1
....
....

After the first outage we set osd_max_backfill to 8, after the second one to 4 but it didn't help. It seems like it is the bug mentioned at http://tracker.ceph.com/issues/6278 . The problem is, that this is a production environment and the problems began after we moved several VMs to it. In our test environment we can't reproduct it but we are working on a larger testinstallation.

Does anybody have an idea how to investigate further without destroying virtual machines? ;)

Sometimes these IO errors lead to kernel panics on the iscsi target machine. The targetcli/lio config is a simple default config without any tuning or big configurations.

Mit freundlichen Grüßen / Best Regards,
Uwe Grohnwaldt

----- Original Message -----
> From: "Timofey" <timofey@xxxxxxxxx>
> To: "Mike Dawson" <mike.dawson@xxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Sent: Dienstag, 17. September 2013 22:37:44
> Subject: Re:  Pause i/o from time to time
> 
> I have examined logs.
> Yes, first time it can be scrubbing. It repaired some self.
> 
> I had 2 servers before first problem: one dedicated for osd (osd.0),
> and second - with osd and websites (osd.1).
> After problem I add third server - dedicated for osd (osd.2) and call
> ceph osd set out osd.1 for replace data.
> 
> In ceph -s i saw normal replacing process and all work good about 5-7
> hours.
> Then I have many misdirected records (few hundreds per second):
> osd.0 [WRN] client.359671  misdirected client.359671.1:220843 pg
> 2.3ae744c0 to osd.0 not [2,0] in e1040/1040
> and errors in i/o operations.
> 
> Now I have about 20GB ceph logs with this errors. (I don't work with
> cluster now - I copy out all data on hdd and work from hdd).
> 
> Is any way have local software raid1 with ceph rbd and local image
> (for work when ceph fail or work slow by any reason).
> I tried mdadm but it work bad - server hang up every few hours.
> 
> > You could be suffering from a known, but unfixed issue [1] where
> > spindle contention from scrub and deep-scrub cause periodic stalls
> > in RBD. You can try to disable scrub and deep-scrub with:
> > 
> > # ceph osd set noscrub
> > # ceph osd set nodeep-scrub
> > 
> > If your problem stops, Issue #6278 is likely the cause. To
> > re-enable scrub and deep-scrub:
> > 
> > # ceph osd unset noscrub
> > # ceph osd unset nodeep-scrub
> > 
> > Because you seem to only have two OSDs, you may also be saturating
> > your disks even without scrub or deep-scrub.
> > 
> > http://tracker.ceph.com/issues/6278
> > 
> > Cheers,
> > Mike Dawson
> > 
> > 
> > On 9/16/2013 12:30 PM, Timofey wrote:
> >> I use ceph for HA-cluster.
> >> Some time ceph rbd go to have pause in work (stop i/o operations).
> >> Sometime it can be when one of OSD slow response to requests.
> >> Sometime it can be my mistake (xfs_freeze -f for one of
> >> OSD-drive).
> >> I have 2 storage servers with one osd on each. This pauses can be
> >> few minutes.
> >> 
> >> 1. Is any settings for fast change primary osd if current osd work
> >> bad (slow, don't response).
> >> 2. Can I use ceph-rbd in software raid-array with local drive, for
> >> use local drive instead of ceph if ceph cluster fail?
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com