Re: Slow requet on node reboot

Hyun Ha <hfamily15@xxxxxxxxx> · Fri, 11 Aug 2017 12:36:55 +0900

Thanks for reply.
In my case, it was an issue about min_size of pool.

# ceph osd pool ls detail
pool 5 'volumes' replicated size 2 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 844 flags hashpspool stripe_width 0
        removed_snaps [1~23]

when replicated size=2 and min_size=2 is set, and osd goes down, ceph cluster go into Err state and client I/O goes hang.

ceph status log>
health HEALTH_ERR
            310 pgs are stuck inactive for more than 300 seconds
            35 pgs backfill_wait
            3 pgs backfilling
            38 pgs degraded
            382 pgs peering
            310 pgs stuck inactive
            310 pgs stuck unclean
            39 pgs undersized
            263 requests are blocked > 32 sec

you can simply reproduce that.
so I solved this by set min_size=1 using "ceph osd pool set volumes min_size 1" command.
It is very strange thing because if min_size can occurs big problem to ceph cluster, ceph would not allow to set same value with replicated_size.

Thanks.

2017-08-10 23:33 GMT+09:00 David Turner <drakonstein@xxxxxxxxx>:
When the node remote, are the osds being marked down immediately? If the node were to reboot, but not Mark the osds down, then all requires to those osds would block until they got marked down.

On Thu, Aug 10, 2017, 5:46 AM Hyun Ha <hfamily15@xxxxxxxxx> wrote:
Hi, Ramirez

I have exactly same problem as yours.
Did you solved that issue?
Do you have expireences or solutions?

Thank you.
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com