Re: Blocked Requests

Shantur Rathore <shantur.rathore@xxxxxxxxx> · Wed, 25 Apr 2018 16:17:18 +0100

Hi all,

So using ceph-ansible, i built the below mentioned cluster with 2 OSD
Nodes and 3 Mons
Just after creating osds i started benchmarking the performance using
"rbd bench" and "rados bench" and started seeing the performance drop.
Checking the status shows slow requests.

[root@storage-28-1 ~]# ceph -s
  cluster:
    id:     009cbed0-e5a8-4b18-a313-098e55742e85
    health: HEALTH_WARN
            insufficient standby MDS daemons available
            1264 slow requests are blocked > 32 sec

  services:
    mon:         3 daemons, quorum storage-30,storage-29,storage-28-1
    mgr:         storage-30(active), standbys: storage-28-1, storage-29
    mds:         cephfs-3/3/3 up
{0=storage-30=up:active,1=storage-28-1=up:active,2=storage-29=up:active}
    osd:         33 osds: 33 up, 33 in
    tcmu-runner: 2 daemons active

  data:
    pools:   3 pools, 1536 pgs
    objects: 13289 objects, 42881 MB
    usage:   102 GB used, 55229 GB / 55331 GB avail
    pgs:     1536 active+clean

  io:
    client:   1694 B/s rd, 1 op/s rd, 0 op/s wr

[root@storage-28-1 ~]# ceph health detail
HEALTH_WARN insufficient standby MDS daemons available; 904 slow
requests are blocked > 32 sec
MDS_INSUFFICIENT_STANDBY insufficient standby MDS daemons available
    have 0; want 1 more
REQUEST_SLOW 904 slow requests are blocked > 32 sec
    364 ops are blocked > 1048.58 sec
    212 ops are blocked > 524.288 sec
    164 ops are blocked > 262.144 sec
    100 ops are blocked > 131.072 sec
    64 ops are blocked > 65.536 sec
    osd.11 has blocked requests > 524.288 sec
    osds 9,32 have blocked requests > 1048.58 sec

osd 9 log : https://pastebin.com/ex41cFww

I see that from time to time different osds are reporting blocked
requests. I am not sure what could be the cause of this. Can anyone
help me fix this please.

[root@storage-28-1 ~]# ceph osd tree
ID CLASS WEIGHT   TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       54.03387 root default
-3       27.83563     host storage-29
 2   hdd  1.63739         osd.2           up  1.00000 1.00000
 3   hdd  1.63739         osd.3           up  1.00000 1.00000
 4   hdd  1.63739         osd.4           up  1.00000 1.00000
 5   hdd  1.63739         osd.5           up  1.00000 1.00000
 6   hdd  1.63739         osd.6           up  1.00000 1.00000
 7   hdd  1.63739         osd.7           up  1.00000 1.00000
 8   hdd  1.63739         osd.8           up  1.00000 1.00000
 9   hdd  1.63739         osd.9           up  1.00000 1.00000
10   hdd  1.63739         osd.10          up  1.00000 1.00000
11   hdd  1.63739         osd.11          up  1.00000 1.00000
12   hdd  1.63739         osd.12          up  1.00000 1.00000
13   hdd  1.63739         osd.13          up  1.00000 1.00000
14   hdd  1.63739         osd.14          up  1.00000 1.00000
15   hdd  1.63739         osd.15          up  1.00000 1.00000
16   hdd  1.63739         osd.16          up  1.00000 1.00000
17   hdd  1.63739         osd.17          up  1.00000 1.00000
18   hdd  1.63739         osd.18          up  1.00000 1.00000
-5       26.19824     host storage-30
 0   hdd  1.63739         osd.0           up  1.00000 1.00000
 1   hdd  1.63739         osd.1           up  1.00000 1.00000
19   hdd  1.63739         osd.19          up  1.00000 1.00000
20   hdd  1.63739         osd.20          up  1.00000 1.00000
21   hdd  1.63739         osd.21          up  1.00000 1.00000
22   hdd  1.63739         osd.22          up  1.00000 1.00000
23   hdd  1.63739         osd.23          up  1.00000 1.00000
24   hdd  1.63739         osd.24          up  1.00000 1.00000
25   hdd  1.63739         osd.25          up  1.00000 1.00000
26   hdd  1.63739         osd.26          up  1.00000 1.00000
27   hdd  1.63739         osd.27          up  1.00000 1.00000
28   hdd  1.63739         osd.28          up  1.00000 1.00000
29   hdd  1.63739         osd.29          up  1.00000 1.00000
30   hdd  1.63739         osd.30          up  1.00000 1.00000
31   hdd  1.63739         osd.31          up  1.00000 1.00000
32   hdd  1.63739         osd.32          up  1.00000 1.00000

thanks

On Fri, Apr 20, 2018 at 10:24 AM, Shantur Rathore
<shantur.rathore@xxxxxxxxx> wrote:
>
> Thanks Alfredo.  I will use ceph-volume.
>
> On Thu, Apr 19, 2018 at 4:24 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>>
>> On Thu, Apr 19, 2018 at 11:10 AM, Shantur Rathore
>> <shantur.rathore@xxxxxxxxx> wrote:
>> > Hi,
>> >
>> > I am building my first Ceph cluster from hardware leftover from a previous
>> > project. I have been reading a lot of Ceph documentation but need some help
>> > to make sure I going the right way.
>> > To set the stage below is what I have
>> >
>> > Rack-1
>> >
>> > 1 x HP DL360 G9 with
>> >    - 256 GB Memory
>> >    - 5 x 300GB HDD
>> >    - 2 x HBA SAS
>> >    - 4 x 10GBe Networking Card
>> >
>> > 1 x SuperMicro chassis with 17 x HP Enterprise 400GB SSD and 17 x HP
>> > Enterprise 1.7TB HDD
>> > Chassis and HP server are connected with 2 x SAS HBA for redundancy.
>> >
>> >
>> > Rack-2 (Same as Rack-1)
>> >
>> > 1 x HP DL360 G9 with
>> >    - 256 GB Memory
>> >    - 5 x 300GB HDD
>> >    - 2 x HBA SAS
>> >    - 4 x 10GBe Networking Card
>> >
>> > 1 x SuperMicro chassis with 17 x HP Enterprise 400GB SSD and 17 x HP
>> > Enterprise 1.7TB HDD
>> > Chassis and HP server are connected with 2 x SAS HBA for redundancy.
>> >
>> >
>> > Rack-3
>> >
>> > 5 x HP DL360 G8 with
>> >    - 128 GB Memory
>> >    - 2 x 400GB HP Enterprise SSD
>> >    - 3 x 1.7TB Enterprise HDD
>> >
>> > Requirements
>> > - To serve storage to around 200 VMware VMs via iSCSI. VMs use disks
>> > moderately.
>> > - To serve storage to some docker containers using ceph volume driver
>> > - To serve storage to some legacy apps using NFS
>> >
>> > Plan
>> >
>> > - Create a ceph cluster with all machines
>> > - Use Bluestore as osd backing ( 3 x SSD for DB and WAL in SuperMicro
>> > Chassis and 1 x SSD for DB and WAL in Rack 3 G8s)
>> > - Use remaining SSDs ( 14 x in SuperMicro and 1 x Rack 3 G8s ) for Rados
>> > Cache Tier
>> > - Update CRUSH map to make Rack as minimum failure domain. So almost all
>> > data is replicated across racks and in case one of the host dies the storage
>> > still works.
>> > - Single bonded network (4x10GBe) connected to ToR switches.
>> > - Same public and cluster network
>> >
>> > Questions
>> >
>> > - First of all, is this kind of setup workable.
>> > - I have seen that Ceph uses /dev/sdx names in guides, is it a good approach
>> > considering the disks die and can come up with different /dev/sdx identifier
>> > on reboot.
>>
>> In the case of ceph-volume, these will not matter since it uses LVM
>> behind the scenes and LVM takes care of figuring out if /dev/sda1 is
>> now really /dev/sdb1 after
>> a reboot.
>>
>> If using ceph-disk however, the detection is done a bit differently,
>> by reading partition labels and depending on UDEV triggers that
>> sometimes can be troublesome, specially
>> on reboot. In the case of a successful detection via UDEV the
>> non-persistent names wouldn't matter much still.
>>
>> > - What should be the approx size of WAL and DB partitions for my kind of
>> > setup?
>> > - Can i install ceph in a VM and use other VMs on these hosts. Is Ceph too
>> > CPU demanding?
>> >
>> > Thanks,
>> > Shantur
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com