Hi all, When accessing the object store service with high traffic, PG down will cause the rgw connection to be occupied by references to the failed object and The rgw connection becomes unavailable. Is there a workaround to solve this phenomenon? system-graph(repost) default.rgw.buckets.data(4k+2m size=6 min_size=5) +---+ +----+ +-----+ +-pg1----------------------------+ USER+-s3cmd--->-+RGW +-get Obj->-+ceph +--+-+*osd1,*osd2,*osd3,osd4,osd5,osd6| DOWN +---+ +----+ +-----+ | +--------------------------------+ x100 connection pool | (100 workers) | +-pg2----------------------------+ +-+osd7,osd8,osd9,osd10,osd11,osd12| ACTIVE | +--------------------------------+ ;; PROBLEM CONDITION *The number of connections for rgw is defined by the default num_threads value of 100. (rgw can send and receive 100 simultaneous I/O requests.) *pg1 goes down due to osd failure (osd.1, osd.2, osd.3). *A fetch request to pg down will wait until the pg is restored. *This occupies one connection of rgw. Repeat *Eventually, num_threads (default=100) is exceeded, and the client cannot connect to rgw. PROBLEM I think it's a problem that PG down takes up all rgw connections. I tried increasing num_threads (rgw_thread_pool_size) and placing multiple rgw's, but they were not useful in high traffic environments. Since rgw only retrieves objects, no one knows that pg1 was down on the request to ceph. Is there any way to detect pg down earlier and finish the get Obj process? -Tsuyoshi. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx