I am quite sure that there are some bugs on the circumstance of read failed(no enough shards) on ec pool.. There are probably two kinds of problem: 1. Did not process the read failure, like client write -> rmw. 2. Detect the read failure, but not process correctly, like in ECBackend::_failed_push() when do backfilling. First case will cause the osd of primary pg w/o eio crash when step to write w/ read errors. Second case will cause backfill target missing some object silently. Hoping some maintainer would help to go close the problem together.. thanks! Gregory Farnum <gfarnum@xxxxxxxxxx> 于2018年11月12日周一 下午3:55写道: > > How did you notice/conclude these are problems? I don't have specific > details about how objects_read_async_no_cache() is handled through the > code, but the erasure code read paths can get pretty obtuse and > multi-layered; in some paths we know there can't be errors but in some > there might be. > > As for the return codes of writes, they definitely can be filled in > with error codes in cases where that's appropriate. But by the time > you get to submitting a write transaction to the disk or backend, it's > going to pass — the only failure mode that can happen at that point > (actual EIO or other error from disk) is one that results in the OSD > suiciding. > -Greg > > > On Sat, Nov 10, 2018 at 7:40 AM cui xiao fei Cui <thinkercui@xxxxxxxxx> wrote: > > > > Hi all, > > We are very confused with two problem of rmw. > > > > 1. the return code of read is not handled. > > > > The callback of read is like this. > > ECBackend::try_state_to_reads > > ... > > objects_read_async_no_cache( > > op->remote_read, > > [this, op](map<hobject_t,pair<int, extent_map> > &&results) { > > for (auto &&i: results) { > > op->remote_read_result.emplace(i.first, i.second.second); > > } > > check_ops(); > > }); > > > > The read return code in pair<int, extent_map> is never considered. > > Shall we detect the return code and cancel the write op immediately, if > > there are read errors, to prevent later assert? > > > > 2. the return code of write is always 0. > > The write op reply is like this. > > > > PrimaryLogPG::execute_ctx > > ... > > ctx->reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, > > successful_write); > > ... > > ctx->register_on_commit( > > [m, ctx, this](){ > > ... > > if (m && !ctx->sent_reply) { > > MOSDOpReply *reply = ctx->reply; > > if (reply) > > ctx->reply = nullptr; > > else { > > reply = new MOSDOpReply(m, 0, > > get_osdmap()->get_epoch(), 0, true); > > reply->set_reply_versions(ctx->at_version, > > ctx->user_at_version); > > } > > osd->send_message_osd_client(reply, m->get_connection()); > > } > > ... > > }); > > > > We find that the return code of write will always 0, if there is no > > error occured > > at the stage of prepare_transaction. > > Shall we return an error code to tell the client there are something > > bad happend?