Re: A question about the return code of write

zengran zhang <z13121369189@xxxxxxxxx> · Sat, 1 Dec 2018 01:40:47 +0800

I am quite sure that there are some bugs on the circumstance of  read
failed(no enough shards) on ec pool..
There are probably two kinds of problem:

1. Did not process the read failure, like client write -> rmw.
2. Detect the read failure, but not process correctly, like in
ECBackend::_failed_push() when do backfilling.

First case will cause the osd of primary pg w/o eio crash when step to
write w/ read errors.
Second case will cause backfill target missing some object silently.

Hoping some maintainer would help to go close the problem together..  thanks!

Gregory Farnum <gfarnum@xxxxxxxxxx> 于2018年11月12日周一 下午3:55写道：
>
> How did you notice/conclude these are problems? I don't have specific
> details about how objects_read_async_no_cache() is handled through the
> code, but the erasure code read paths can get pretty obtuse and
> multi-layered; in some paths we know there can't be errors but in some
> there might be.
>
> As for the return codes of writes, they definitely can be filled in
> with error codes in cases where that's appropriate. But by the time
> you get to submitting a write transaction to the disk or backend, it's
> going to pass — the only failure mode that can happen at that point
> (actual EIO or other error from disk) is one that results in the OSD
> suiciding.
> -Greg
>
>
> On Sat, Nov 10, 2018 at 7:40 AM cui xiao fei Cui <thinkercui@xxxxxxxxx> wrote:
> >
> > Hi all,
> > We are very confused with two problem of rmw.
> >
> > 1. the return code of read is not handled.
> >
> > The callback of read is like this.
> > ECBackend::try_state_to_reads
> >     ...
> >     objects_read_async_no_cache(
> >         op->remote_read,
> >         [this, op](map<hobject_t,pair<int, extent_map> > &&results) {
> >             for (auto &&i: results) {
> >                 op->remote_read_result.emplace(i.first, i.second.second);
> >             }
> >             check_ops();
> >       });
> >
> > The read return code in pair<int, extent_map> is never considered.
> > Shall we detect the return code and cancel the write op immediately, if
> > there are read errors, to prevent later assert?
> >
> > 2. the return code of write is always 0.
> > The write op reply is like this.
> >
> > PrimaryLogPG::execute_ctx
> >     ...
> >     ctx->reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0,
> >         successful_write);
> >     ...
> >     ctx->register_on_commit(
> >     [m, ctx, this](){
> >         ...
> >         if (m && !ctx->sent_reply) {
> >             MOSDOpReply *reply = ctx->reply;
> >             if (reply)
> >                 ctx->reply = nullptr;
> >             else {
> >                 reply = new MOSDOpReply(m, 0,
> > get_osdmap()->get_epoch(), 0, true);
> >                 reply->set_reply_versions(ctx->at_version,
> >                         ctx->user_at_version);
> >             }
> >             osd->send_message_osd_client(reply, m->get_connection());
> >         }
> >         ...
> >     });
> >
> > We find that the return code of write will always 0, if there is no
> > error occured
> > at the stage of prepare_transaction.
> > Shall we return an error code to tell the client there are something
> > bad happend?