Re: Troubleshooting hanging storage backend whenever there is any cluster change

David Turner <drakonstein@xxxxxxxxx> · Thu, 18 Oct 2018 09:42:44 -0400

What are you OSD node stats?  CPU, RAM, quantity and size of OSD disks.  You might need to modify some bluestore settings to speed up the time it takes to peer or perhaps you might just be underpowering the amount of OSD disks you're trying to do and your servers and OSD daemons are going as fast as they can.
On Sat, Oct 13, 2018 at 4:08 PM Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:
and a 3rd one:

    health: HEALTH_WARN

            1 MDSs report slow metadata IOs

            1 MDSs report slow requests

2018-10-13 21:44:08.150722 mds.cloud1-1473 [WRN] 7 slow requests, 1

included below; oldest blocked for > 199.922552 secs

2018-10-13 21:44:08.150725 mds.cloud1-1473 [WRN] slow request 34.829662

seconds old, received at 2018-10-13 21:43:33.321031:

client_request(client.216121228:929114 lookup #0x1/.active.lock

2018-10-13 21:43:33.321594 caller_uid=0, caller_gid=0{}) currently

failed to rdlock, waiting

The relevant OSDs are bluestore again running at 100% I/O:

iostat shows:

sdi              77,00     0,00  580,00   97,00 511032,00   972,00

1512,57    14,88   22,05   24,57    6,97   1,48 100,00

so it reads with 500MB/s which completely saturates the osd. And it does

for > 10 minutes.

Greets,

Stefan

Am 13.10.2018 um 21:29 schrieb Stefan Priebe - Profihost AG:

> 

> ods.19 is a bluestore osd on a healthy 2TB SSD.

> 

> Log of osd.19 is here:

> https://pastebin.com/raw/6DWwhS0A

> 

> Am 13.10.2018 um 21:20 schrieb Stefan Priebe - Profihost AG:

>> Hi David,

>>

>> i think this should be the problem - form a new log from today:

>>

>> 2018-10-13 20:57:20.367326 mon.a [WRN] Health check update: 4 osds down

>> (OSD_DOWN)

>> ...

>> 2018-10-13 20:57:41.268674 mon.a [WRN] Health check update: Reduced data

>> availability: 3 pgs peering (PG_AVAILABILITY)

>> ...

>> 2018-10-13 20:58:08.684451 mon.a [WRN] Health check failed: 1 osds down

>> (OSD_DOWN)

>> ...

>> 2018-10-13 20:58:22.841210 mon.a [WRN] Health check failed: Reduced data

>> availability: 8 pgs inactive (PG_AVAILABILITY)

>> ....

>> 2018-10-13 20:58:47.570017 mon.a [WRN] Health check update: Reduced data

>> availability: 5 pgs inactive (PG_AVAILABILITY)

>> ...

>> 2018-10-13 20:58:49.142108 osd.19 [WRN] Monitor daemon marked osd.19

>> down, but it is still running

>> 2018-10-13 20:58:53.750164 mon.a [WRN] Health check update: Reduced data

>> availability: 3 pgs inactive (PG_AVAILABILITY)

>> ...

>>

>> so there is a timeframe of > 90s whee PGs are inactive and unavail -

>> this would at least explain stalled I/O to me?

>>

>> Greets,

>> Stefan

>>

>>

>> Am 12.10.2018 um 15:59 schrieb David Turner:

>>> The PGs per OSD does not change unless the OSDs are marked out.  You

>>> have noout set, so that doesn't change at all during this test.  All of

>>> your PGs peered quickly at the beginning and then were active+undersized

>>> the rest of the time, you never had any blocked requests, and you always

>>> had 100MB/s+ client IO.  I didn't see anything wrong with your cluster

>>> to indicate that your clients had any problems whatsoever accessing data.

>>>

>>> Can you confirm that you saw the same problems while you were running

>>> those commands?  The next thing would seem that possibly a client isn't

>>> getting an updated OSD map to indicate that the host and its OSDs are

>>> down and it's stuck trying to communicate with host7.  That would

>>> indicate a potential problem with the client being unable to communicate

>>> with the Mons maybe?  Have you completely ruled out any network problems

>>> between all nodes and all of the IPs in the cluster.  What does your

>>> client log show during these times?

>>>

>>> On Fri, Oct 12, 2018 at 8:35 AM Nils Fahldieck - Profihost AG

>>> <n.fahldieck@xxxxxxxxxxxx <mailto:n.fahldieck@xxxxxxxxxxxx>> wrote:

>>>

>>>     Hi, in our `ceph.conf` we have:

>>>

>>>       mon_max_pg_per_osd = 300

>>>

>>>     While the host is offline (9 OSDs down):

>>>

>>>       4352 PGs * 3 / 62 OSDs ~ 210 PGs per OSD

>>>

>>>     If all OSDs are online:

>>>

>>>       4352 PGs * 3 / 71 OSDs ~ 183 PGs per OSD

>>>

>>>     ... so this doesn't seem to be the issue.

>>>

>>>     If I understood you right, that's what you've meant. If I got you wrong,

>>>     would you mind to point to one of those threads you mentioned?

>>>

>>>     Thanks :)

>>>

>>>     Am 12.10.2018 um 14:03 schrieb Burkhard Linke:

>>>     > Hi,

>>>     >

>>>     >

>>>     > On 10/12/2018 01:55 PM, Nils Fahldieck - Profihost AG wrote:

>>>     >> I rebooted a Ceph host and logged `ceph status` & `ceph health

>>>     detail`

>>>     >> every 5 seconds. During this I encountered 'PG_AVAILABILITY

>>>     Reduced data

>>>     >> availability: pgs peering'. At the same time some VMs hung as

>>>     described

>>>     >> before.

>>>     >

>>>     > Just a wild guess... you have 71 OSDs and about 4500 PG with size=3.

>>>     > 13500 PG instance overall, resulting in ~190 PGs per OSD under normal

>>>     > circumstances.

>>>     >

>>>     > If one host is down and the PGs have to re-peer, you might reach the

>>>     > limit of 200 PG/OSDs on some of the OSDs, resulting in stuck peering.

>>>     >

>>>     > You can try to raise this limit. There are several threads on the

>>>     > mailing list about this.

>>>     >

>>>     > Regards,

>>>     > Burkhard

>>>     >

>>>     _______________________________________________

>>>     ceph-users mailing list

>>>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>

>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com