Re: sync errors are not cleared

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ceph-devel 

On Wed, Feb 2, 2022 at 10:56 PM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
On Wed, Feb 2, 2022 at 8:36 AM Yuval Lifshitz <ylifshit@xxxxxxxxxx> wrote:
>
> i do see sync errors with "ERR_BUSY_RESHARDING": https://0x0.st/oH3D.json
> after dynamic reshard happened mid-sync, even though sync was finished successfully.
>
> is this expected?

those errors are possible, but i wouldn't say expected.

but shouldn't the errors get cleared after the objects were successfully synced?
 
if fetch_remote_obj() is returning this error, that seems to imply that
RGWRados::guard_reshard() retried the index operation
NUM_RESHARD_RETRIES=10 times and still found it locked for resharding.
and after each try, guard_reshard() calls
RGWRados::block_while_resharding(), which has its own retry loop with
num_retries=10 that polls the reshard status then sleeps 5 seconds
with reshard_wait->wait()

if my understanding is correct, that would mean that the successful
reshard took over ~500 seconds to complete? or something under
guard_reshard() isn't working right

it looks like there is a problem. when I look at the client that uploads the objects to the primary it gets stalled for about 10 seconds, while the reshard is happening. however, the 2ndary sync process is stalled for a much longer period, until it successfully syncs

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux