Re: osd slow response when formatting rbd image

"Rath, Sven" <Sven.Rath@xxxxxxxxxx> · Fri, 21 Apr 2017 10:56:43 +0000

Hi all,

today i probably found a solution for this (unfortunately not the reason)
The problem only occurs when using ceph-kraken version on my clients.
If i use ceph-jewel (which was running on my iscsi gateways) the problem does not appear.

Best regards,
Sven

Von: Rath, Sven

Gesendet: Donnerstag, 20. April 2017 16:34

An: 'ceph-users@xxxxxxxxxxxxxx' <ceph-users@xxxxxxxxxxxxxx>

Betreff: osd slow response when formatting rbd image 

Hi all,

hope you are all doing well and maybe some of you can help me with a problem i’m focusing recently.
I started to evaluate ceph a couple of months ago and I now have a very strange problem while formatting rbd images.
The Problem only occurs when using rbd images directly with the kernel rbd module loaded.
If I add the rbd image via one of our iscsi gateways (tgt iscsi) as iscsi device, formatting is no problem and I can afterwards use the rbd image on any host without problems.
Thats a workaround but i would like to find why it is not working with rbd directly for me…

Problem explained:

I create a pool and an image:

ceph osd pool create pool-C 250 250
rbd create test --size 3548290 --pool pool-C --image-feature layering

I map the rbd image to my client (doesnt matter which client)
rbd map pool-C/test --id admin --keyring /etc/ceph/ceph.client.admin.keyring

As soon as I start to format (xfs or ext4) the image some of my osds start to fail:

mkfs.xfs /dev/rbd/pool-C/test-part1

I see the following entrys as soon as i start formatting:
The OSD IDs are different each time. I guess ist also more a problem with the journals.

EXAMPLE:
2017-04-20 13:43:24.529953 osd.1 [WRN] slow request 30.439722 seconds old, received at 2017-04-20 13:42:54.090001: osd_op(client.344964.1:8170 9.cbb68aa5 rbd_data.540dc238e1f29.0000000000001e7c [delete] snapc 0=[] ondisk+write e2002) currently
 started
2017-04-20 13:43:24.529984 osd.1 [WRN] slow request 30.431389 seconds old, received at 2017-04-20 13:42:54.098334: osd_op(client.344964.1:8489 9.f414989 rbd_data.540dc238e1f29.0000000000001fbb [delete] snapc 0=[] ondisk+write e2002) currently
 started

2017-04-20 13:42:50.870651 mon.0 [INF] osd.10 172.10.10.2:6804/15031 failed (forced)
2017-04-20 13:42:51.989500 mon.0 [INF] osd.11 172.10.10.2:6806/15690 failed (forced)

I found out that allways some of the Journal SSDs are disappearing when starting to format and therefor the osds on this journal too.
Thats really strange to me.
Any benchmark is working fine and also if the rbd image is formatted via iscsi I can use it without any problems.

My environment:

ceph-11.2.0-0.el7.x86_64 on CentOS 7.3

3 Monitor Hosts

2 OSD Hosts:
2x Intel(R) Xeon(R) CPU E5-2630L v4 (HT on, C1-state 
60 GB Memory
1 GE Ethernet (internal)
1 GE Ethernet (external)

4 x SSD (Intel SSDSC2BB480G401)
8 x 1,1TB SAS3 (XFS)
All disks connected to a HBA, no RAID arrays: PMC Adaptec HBA 1000-8i8e

[agpceph02][DEBUG ] /dev/sda :
[agpceph02][DEBUG ]  /dev/sda1 ceph journal, for /dev/sde1
[agpceph02][DEBUG ]  /dev/sda2 ceph journal, for /dev/sdf1
[agpceph02][DEBUG ] /dev/sdb :
[agpceph02][DEBUG ]  /dev/sdb1 ceph journal, for /dev/sdg1
[agpceph02][DEBUG ]  /dev/sdb2 ceph journal, for /dev/sdh1
[agpceph02][DEBUG ] /dev/sdc :
[agpceph02][DEBUG ]  /dev/sdc1 ceph journal, for /dev/sdi1
[agpceph02][DEBUG ]  /dev/sdc2 ceph journal, for /dev/sdj1
[agpceph02][DEBUG ] /dev/sdd :
[agpceph02][DEBUG ]  /dev/sdd1 ceph journal, for /dev/sdk1
[agpceph02][DEBUG ]  /dev/sdd2 ceph journal, for /dev/sdl1
[agpceph02][DEBUG ] /dev/sde :
[agpceph02][DEBUG ]  /dev/sde1 ceph data, active, cluster ceph, osd.8, journal /dev/sda1
[agpceph02][DEBUG ] /dev/sdf :
[agpceph02][DEBUG ]  /dev/sdf1 ceph data, active, cluster ceph, osd.9, journal /dev/sda2
[agpceph02][DEBUG ] /dev/sdg :
[agpceph02][DEBUG ]  /dev/sdg1 ceph data, active, cluster ceph, osd.10, journal /dev/sdb1
[agpceph02][DEBUG ] /dev/sdh :
[agpceph02][DEBUG ]  /dev/sdh1 ceph data, active, cluster ceph, osd.11, journal /dev/sdb2
[agpceph02][DEBUG ] /dev/sdi :
[agpceph02][DEBUG ]  /dev/sdi1 ceph data, active, cluster ceph, osd.12, journal /dev/sdc1
[agpceph02][DEBUG ] /dev/sdj :
[agpceph02][DEBUG ]  /dev/sdj1 ceph data, active, cluster ceph, osd.13, journal /dev/sdc2
[agpceph02][DEBUG ] /dev/sdk :
[agpceph02][DEBUG ]  /dev/sdk1 ceph data, active, cluster ceph, osd.14, journal /dev/sdd1
[agpceph02][DEBUG ] /dev/sdl :
[agpceph02][DEBUG ]  /dev/sdl1 ceph data, active, cluster ceph, osd.15, journal /dev/sdd2

[root@agpceph-admin ceph]# ceph -s
    cluster 8edd3cdc-02c3-4b60-a150-897aeb0dda14
     health HEALTH_OK
     monmap e3: 3 mons at {agpceph-mon01=172.10.10.50:6789/0,agpceph01=172.10.10.1:6789/0,agpceph02=172.10.10.2:6789/0}
            election epoch 110, quorum 0,1,2 agpceph01,agpceph02,agpceph-mon01
        mgr active: agpceph-mon01 standbys: agpceph01, agpceph02
     osdmap e2132: 16 osds: 16 up, 16 in
            flags sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v96164: 650 pgs, 3 pools, 15611 MB data, 3979 objects
            31517 MB used, 17845 GB / 17876 GB avail
                 650 active+clean

[root@agpceph-admin ceph]# ceph osd tree
ID WEIGHT   TYPE NAME              UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 17.45752 root default
-5  8.72876     rack Rack391
-2  8.72876         host agpceph01
0  1.09109             osd.0           up  1.00000          1.00000
1  1.09109             osd.1           up  1.00000          1.00000
2  1.09109             osd.2           up  1.00000          1.00000
3  1.09109             osd.3           up  1.00000          1.00000
4  1.09109             osd.4           up  1.00000          1.00000
5  1.09109             osd.5           up  1.00000          1.00000
6  1.09109             osd.6           up  1.00000          1.00000
7  1.09109             osd.7           up  1.00000          1.00000
-4  8.72876     rack Rack320
-3  8.72876         host agpceph02
8  1.09109             osd.8           up  1.00000          1.00000
9  1.09109             osd.9           up  1.00000          1.00000
10  1.09109             osd.10          up  1.00000          1.00000
11  1.09109             osd.11          up  1.00000          1.00000
12  1.09109             osd.12          up  1.00000          1.00000
13  1.09109             osd.13          up  1.00000          1.00000
14  1.09109             osd.14          up  1.00000          1.00000
15  1.09109             osd.15          up  1.00000          1.00000

[root@agpceph-admin ceph]# cat /etc/ceph/ceph.conf
[global]
fsid = 8edd3cdc-02c3-4b60-a150-897aeb0dda14
mon_initial_members = agpceph01, agpceph02, agpceph-mon01
mon_host = 172.10.10.1,172.10.10.2,172.10.10.50
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

osd journal size = 81920
public network = 172.10.10.0/24
cluster network = 172.10.11.0/24

osd pool default size =  2
osd pool default min size = 1
osd pool default pg num = 35
osd pool default pgp num = 35

osd crush chooseleaf type = 3

log file = /var/log/ceph/cluster.log
log to syslog = true
mon_allow_pool_delete = true
mon osd allow primary affinity = true

[client]
rbd_cache = false

Maybe someone also had this problem and could give me any advice ?

Many thanks in advance and kind regards,
Sven

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com