On Fri, Mar 10, 2017 at 10:15 AM, Shaohua Li <shli@xxxxxxxxxx> wrote: > On Fri, Mar 10, 2017 at 04:16:58PM +0100, Artur Paszkiewicz wrote: >> On 03/10/2017 12:24 AM, Shaohua Li wrote: >> > On Thu, Mar 09, 2017 at 09:59:59AM +0100, Artur Paszkiewicz wrote: >> >> Implement the calculation of partial parity for a stripe and PPL write >> >> logging functionality. The description of PPL is added to the >> >> documentation. More details can be found in the comments in raid5-ppl.c. >> >> >> >> Attach a page for holding the partial parity data to stripe_head. >> >> Allocate it only if mddev has the MD_HAS_PPL flag set. >> >> >> >> Partial parity is the xor of not modified data chunks of a stripe and is >> >> calculated as follows: >> >> >> >> - reconstruct-write case: >> >> xor data from all not updated disks in a stripe >> >> >> >> - read-modify-write case: >> >> xor old data and parity from all updated disks in a stripe >> >> >> >> Implement it using the async_tx API and integrate into raid_run_ops(). >> >> It must be called when we still have access to old data, so do it when >> >> STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is >> >> stored into sh->ppl_page. >> >> >> >> Partial parity is not meaningful for full stripe write and is not stored >> >> in the log or used for recovery, so don't attempt to calculate it when >> >> stripe has STRIPE_FULL_WRITE. >> >> >> >> Put the PPL metadata structures to md_p.h because userspace tools >> >> (mdadm) will also need to read/write PPL. >> >> >> >> Warn about using PPL with enabled disk volatile write-back cache for >> >> now. It can be removed once disk cache flushing before writing PPL is >> >> implemented. >> >> >> >> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx> >> > >> > ... snip ... >> > >> >> +struct dma_async_tx_descriptor * >> >> +ops_run_partial_parity(struct stripe_head *sh, struct raid5_percpu *percpu, >> >> + struct dma_async_tx_descriptor *tx) >> >> +{ >> >> + int disks = sh->disks; >> >> + struct page **xor_srcs = flex_array_get(percpu->scribble, 0); >> >> + int count = 0, pd_idx = sh->pd_idx, i; >> >> + struct async_submit_ctl submit; >> >> + >> >> + pr_debug("%s: stripe %llu\n", __func__, (unsigned long long)sh->sector); >> >> + >> >> + /* >> >> + * Partial parity is the XOR of stripe data chunks that are not changed >> >> + * during the write request. Depending on available data >> >> + * (read-modify-write vs. reconstruct-write case) we calculate it >> >> + * differently. >> >> + */ >> >> + if (sh->reconstruct_state == reconstruct_state_prexor_drain_run) { >> >> + /* rmw: xor old data and parity from updated disks */ >> >> + for (i = disks; i--;) { >> >> + struct r5dev *dev = &sh->dev[i]; >> >> + if (test_bit(R5_Wantdrain, &dev->flags) || i == pd_idx) >> >> + xor_srcs[count++] = dev->page; >> >> + } >> >> + } else if (sh->reconstruct_state == reconstruct_state_drain_run) { >> >> + /* rcw: xor data from all not updated disks */ >> >> + for (i = disks; i--;) { >> >> + struct r5dev *dev = &sh->dev[i]; >> >> + if (test_bit(R5_UPTODATE, &dev->flags)) >> >> + xor_srcs[count++] = dev->page; >> >> + } >> >> + } else { >> >> + return tx; >> >> + } >> >> + >> >> + init_async_submit(&submit, ASYNC_TX_XOR_ZERO_DST, tx, NULL, sh, >> >> + flex_array_get(percpu->scribble, 0) >> >> + + sizeof(struct page *) * (sh->disks + 2)); >> > >> > Since this should be done before biodrain, should this add ASYNC_TX_FENCE flag? >> >> The result of this calculation isn't used later by other async_tx >> operations, so it's not needed here, if I understand this correctly. But >> maybe later we could optimize and use partial parity to calculate full >> parity, then it will be necessary. > > But the source pages will be overwritten soon, if no fence, I think this is a > problem. Anyway, I'll let Dan to clarify. > > Dan, > We are using async API for below operations: > 1. xor several source pages to a dest page > 2. memcpy other data to the source pages > > The two operations will be in an async chain. Should the first operation > include ASYNC_TX_FENCE flag? The API comment declares the flag is required if > the destination page is used by subsequent operations, but I suspect it should > be used too if the subsequent operations could change the source pages. I think you're right. If operation2 could change operation1 source data we should include the fence. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html