Re: One object degraded cause all ceph requests hang - Jewel 10.2.6 (rbd + radosgw)

Rudenko Aleksandr <ARudenko@xxxxxxx> · Thu, 29 Mar 2018 18:50:15 +0000

Thank you Vincent, it’s very helpfull for me!

On 11 Jan 2018, at 14:24, Vincent Godin <vince.mlist@xxxxxxxxx> wrote:

As no response were given, i will explain what i found : maybe it

could help other people

.dirXXXXXXX object is an index marker with a 0 data size. The metadata

associated to this object (located in the levelDB of the OSDs

currently holding this marker) is the index of the bucket

corresponding to this marker.

My problem came from the number of objects stored in this bucket :

more than 50 millions. As the size of an object in the index is

between 200 and 250 bytes, the index should have a 12 GB size. That's

why it is recommanded to add a shard the index for each 100.000

objects.

During a ceph process rebuild, some pgs move from some OSDs to others.

When a index is moving, all the write requests to the bucket are

blocked till the operation completed. During this move, the user had

launched an upload batch on the bucket so a lot of requests were

blocked, leading to block all the requests on the primary pgs hold by

the OSD.

So the loop i saw was in fact just normal and but moving a 12 GB

object from one SATA to an other takes several minutes, to long in

fact for a ceph cluster with a lot of clients to survive

The lesson of this story is : Don't forget to shard your bucket !!!

---------------------------------------------------------------------------------------------------------------

Yesterday we just encountered this bug. One OSD was looping on

"2018-01-03 16:20:59.148121 7f011a6a1700  0 log_channel(cluster) log

[WRN] : slow request 30.254269 seconds old, received at 2018-01-03

16:20:28.883837: osd_op(client.48285929.0:14601958 35.8abfc02e

.dir.0a3e5369-ff79-4f7d-b0b6-79c5a75b1759.29113876.1 [call

rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected

e359833) currently waiting for degraded object".

The requests on this OSD.150 went quickly in blocked state

2018-01-03 16:25:56.241064 7f011a6a1700  0 log_channel(cluster) log

[WRN] : 20 slow requests, 1 included below; oldest blocked for >

327.357139 secs

2018-01-03 16:30:19.299288 7f011a6a1700  0 log_channel(cluster) log

[WRN] : 45 slow requests, 1 included below; oldest blocked for >

590.415387 secs

...

...

2018-01-03 16:46:04.900204 7f011a6a1700  0 log_channel(cluster) log

[WRN] : 100 slow requests, 2 included below; oldest blocked for >

1204.060056 secs

while still looping

2018-01-03 16:46:04.900220 7f011a6a1700  0 log_channel(cluster) log

[WRN] : slow request 123.294762 seconds old, received at 2018-01-03

16:44:01.605320 : osd_op(client.48285929.0:14605228 35.8abfc02e

.dir.0a3e5369-ff79-4f7d-b0b6-79c5a75b1759.29113876.1 [call

rgw.bucket_complete_op] snapc 0=[]

ack+ondisk+write+known_if_redirected e359833) currently waiting for

degraded object

All theses resquest were blocked on OSD.150.

A lot of VMs attached to Ceph were hanging.

The degraded object was

.dir.0a3e5369-ff79-4f7d-b0b6-79c5a75b1759.29113876.1 in the pg 35.2e.

This PG was located on 4 OSDs. The object has a 0 size on the 4 OSDs.

It was not possible to do a ceph osd pg 35.2e query with a response.

Killing the OSD.150 lead to the requests bloqued on the new primary.

I found the relatively new bug #22072 which looks like mine but there

was no response from the ceph team. I finally tried the same solution

: rados rm -p pool/degraded_object but with no response from the

command. I stopped the command after 15 mn. Few minutes later, the 4

OSDs holding the pg 35.2e suddenly rebooted and the problem was

solved. The object was deleted on the 4 OSDs.

Anyway, it leads to a production break and i have no idea of what

produced the "degraded object" and i'm not sure if the solution came

from my command or from a inside process. At this time we are still

trying to repare some filesystems of the VMs attached to Ceph and i

have to explain that this all production break comes from one empty

object ... The real problem is why Ceph was unable to handle this

"degraded object" and looped on it, blocking all the requests on the

OSD.150 ?

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com