On Sun, Dec 2, 2018 at 7:26 PM xiaofei Cui <thinkercui@xxxxxxxxx> wrote: > > thanks for all your reply. > > now, the main question maybe: > 1. how to deal with m+1 or more osds failure of EC k+m? > > thanks for any opinion There are a number of places where we deal with issues similar to this throughout the code base, especially in scrub and repair. Handling m+1 failures is nastier than what we normally expect to happen, but in the more general cases it's probably best to look at what's happened in the parts of the tree that handle this correctly; I think David has done a lot of work on it? -Greg > > > > > zengran zhang <z13121369189@xxxxxxxxx> 于2018年11月30日周五 下午5:42写道: >> >> I am quite sure that there are some bugs on the circumstance of read >> failed(no enough shards) on ec pool.. >> There are probably two kinds of problem: >> >> 1. Did not process the read failure, like client write -> rmw. >> 2. Detect the read failure, but not process correctly, like in >> ECBackend::_failed_push() when do backfilling. >> >> First case will cause the osd of primary pg w/o eio crash when step to >> write w/ read errors. >> Second case will cause backfill target missing some object silently. >> >> Hoping some maintainer would help to go close the problem together.. thanks! >> >> Gregory Farnum <gfarnum@xxxxxxxxxx> 于2018年11月12日周一 下午3:55写道: >> > >> > How did you notice/conclude these are problems? I don't have specific >> > details about how objects_read_async_no_cache() is handled through the >> > code, but the erasure code read paths can get pretty obtuse and >> > multi-layered; in some paths we know there can't be errors but in some >> > there might be. >> > >> > As for the return codes of writes, they definitely can be filled in >> > with error codes in cases where that's appropriate. But by the time >> > you get to submitting a write transaction to the disk or backend, it's >> > going to pass — the only failure mode that can happen at that point >> > (actual EIO or other error from disk) is one that results in the OSD >> > suiciding. >> > -Greg >> > >> > >> > On Sat, Nov 10, 2018 at 7:40 AM cui xiao fei Cui <thinkercui@xxxxxxxxx> wrote: >> > > >> > > Hi all, >> > > We are very confused with two problem of rmw. >> > > >> > > 1. the return code of read is not handled. >> > > >> > > The callback of read is like this. >> > > ECBackend::try_state_to_reads >> > > ... >> > > objects_read_async_no_cache( >> > > op->remote_read, >> > > [this, op](map<hobject_t,pair<int, extent_map> > &&results) { >> > > for (auto &&i: results) { >> > > op->remote_read_result.emplace(i.first, i.second.second); >> > > } >> > > check_ops(); >> > > }); >> > > >> > > The read return code in pair<int, extent_map> is never considered. >> > > Shall we detect the return code and cancel the write op immediately, if >> > > there are read errors, to prevent later assert? >> > > >> > > 2. the return code of write is always 0. >> > > The write op reply is like this. >> > > >> > > PrimaryLogPG::execute_ctx >> > > ... >> > > ctx->reply = new MOSDOpReply(m, 0, get_osdmap()->get_epoch(), 0, >> > > successful_write); >> > > ... >> > > ctx->register_on_commit( >> > > [m, ctx, this](){ >> > > ... >> > > if (m && !ctx->sent_reply) { >> > > MOSDOpReply *reply = ctx->reply; >> > > if (reply) >> > > ctx->reply = nullptr; >> > > else { >> > > reply = new MOSDOpReply(m, 0, >> > > get_osdmap()->get_epoch(), 0, true); >> > > reply->set_reply_versions(ctx->at_version, >> > > ctx->user_at_version); >> > > } >> > > osd->send_message_osd_client(reply, m->get_connection()); >> > > } >> > > ... >> > > }); >> > > >> > > We find that the return code of write will always 0, if there is no >> > > error occured >> > > at the stage of prepare_transaction. >> > > Shall we return an error code to tell the client there are something >> > > bad happend?