> On 18 Mar 2019, at 14.28, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: > > > > On 18.03.2019 08:42, Javier González wrote: >>> On 14 Mar 2019, at 17.04, Igor Konopko <igor.j.konopko@xxxxxxxxx> wrote: >>> >>> In case when mapping fails (called from writer thread) due to lack of >>> lines, currently we are calling pblk_pipeline_stop(), which waits >>> for pending write IOs, so it will lead to the deadlock. Switching >>> to __pblk_pipeline_stop() in that case instead will fix that. >>> >>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx> >>> --- >>> drivers/lightnvm/pblk-map.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/lightnvm/pblk-map.c b/drivers/lightnvm/pblk-map.c >>> index 5408e32..afc10306 100644 >>> --- a/drivers/lightnvm/pblk-map.c >>> +++ b/drivers/lightnvm/pblk-map.c >>> @@ -46,7 +46,7 @@ static int pblk_map_page_data(struct pblk *pblk, unsigned int sentry, >>> pblk_line_close_meta(pblk, prev_line); >>> >>> if (!line) { >>> - pblk_pipeline_stop(pblk); >>> + __pblk_pipeline_stop(pblk); >>> return -ENOSPC; >>> } >>> >>> -- >>> 2.9.5 >> Have you seeing this problem? >> Before checking if there is a line, we are closing metadata for the >> previous line, so all inflight I/Os should be clear. Can you develop on >> the case in which this would happen? > > So we have following sequence: pblk_pipeline_stop() -> __pblk_pipeline_flush() -> pblk_flush_writer() -> wait for emptying round buffer. > This will never complete, since we still have some RB entries, which cannot be written since writer thread is blocked with waiting inside pblk_flush_writer(). > > Am I missing sth? So this will be the case in which we are in the last line and pblk_flush_writer() needs to allocate an extra line persist the write buffer? Shouldn’t the rate-limiter take care of this? As I recall, Hans implemented some logic to guarantee that at least one line is always available for GC, which in turn will free a line for user data. When we hit this limit, performance will drop dramatically, but it should not stall. The reason I want to understand the real case behind this fix is that by calling __pblk_pipeline_stop() we are basically stopping all other inflight I/Os; we should be able to serve all inflight I/Os before a mapping error triggers the pipeline to get into read-only mode.
Attachment:
signature.asc
Description: Message signed with OpenPGP