On 5/31/16, 5:00 PM, "Guoqing Jiang" <gqjiang@xxxxxxxx> wrote: >Hi, > >I don't know a lot about raid5-cache, just a quick glance about syntax. > >On 05/27/2016 01:29 AM, Song Liu wrote: >> This is the write part of r5cache. The cache is integrated with >> stripe cache of raid456. It leverages code of r5l_log to write >> data to journal device. >> >> r5cache split current write path into 2 parts: the write path >> and the reclaim path. The write path is as following: >> 1. write data to journal >> 2. call bio_endio >> >> Then the reclaim path is as: >> 1. Freeze the stripe (no more writes coming in) >> 2. Calcualte parity (reconstruct or RMW) >> 3. Write parity to journal device (data is already written to it) >> 4. Write data and parity to RAID disks >> >> With r5cache, write operation does not wait for parity calculation >> and write out, so the write latency is lower (1 write to journal >> device vs. read and then write to raid disks). Also, r5cache will >> reduce RAID overhead (multipile IO due to read-modify-write of >> parity) and provide more opportunities of full stripe writes. >> >> r5cache adds a new state of each stripe: enum r5c_states. The write >> path runs in state CLEAN and RUNNING (data in cache). Cache writes >> start from r5c_handle_stripe_dirtying(), where bit R5_Wantcache is >> set for devices with bio in towrite. Then, the data is written to >> the journal through r5l_log implementation. Once the data is in the >> journal, we set bit R5_InCache, and presue bio_endio for these >> writes. >> >> The reclaim path starts by freezing the stripe (no more writes). >> This stripes back to raid5 state machine, where >> handle_stripe_dirtying will evaluate the stripe for reconstruct >> writes or RMW writes (read data and calculate parity). >> >> For RMW, the code allocates extra page for prexor. Specifically, >> a new page is allocated for r5dev->page to do prexor; while >> r5dev->orig_page keeps the cached data. The extra page is freed >> after prexor. >> >> r5cache naturally excludes SkipCopy. With R5_Wantcache bit set, >> async_copy_data will not skip copy. >> >> Before writing data to RAID disks, the r5l_log logic stores >> parity (and non-overwrite data) to the journal. >> >> Instead of inactive_list, stripes with cached data are tracked in >> r5conf->r5c_cached_list. r5conf->r5c_cached_stripes tracks how >> many stripes has dirty data in the cache. >> >> Two sysfs entries are provided for the write cache: >> 1. r5c_cached_stripes shows how many stripes have cached data. >> Writing anything to r5c_cached_stripes will flush all data >> to RAID disks. >> 2. r5c_cache_mode provides knob to switch the system between >> write-back or write-through (only write-log). >> >> There are some known limitations of the cache implementation: >> >> 1. Write cache only covers full page writes (R5_OVERWRITE). Writes >> of smaller granularity are write through. >> 2. Only one log io (sh->log_io) for each stripe at anytime. Later >> writes for the same stripe have to wait. This can be improved by >> moving log_io to r5dev. >> >> Signed-off-by: Song Liu <songliubraving@xxxxxx> >> Signed-off-by: Shaohua Li <shli@xxxxxx> >> --- >> drivers/md/raid5-cache.c | 399 +++++++++++++++++++++++++++++++++++++++++++++-- >> drivers/md/raid5.c | 172 +++++++++++++++++--- >> drivers/md/raid5.h | 38 ++++- >> 3 files changed, 577 insertions(+), 32 deletions(-) >> > >[snip] > >> + >> +/* >> + * this journal write must contain full parity, >> + * it may also contain data of none-overwrites >> + */ >> +static void r5c_handle_parity_cached(struct stripe_head *sh) >> +{ >> + int i; >> + >> + for (i = sh->disks; i--; ) >> + if (test_bit(R5_InCache, &sh->dev[i].flags)) >> + set_bit(R5_Wantwrite, &sh->dev[i].flags); >> + r5c_set_state(sh, R5C_STATE_PARITY_DONE); >> +} >> + >> +static void r5c_finish_cache_stripe(struct stripe_head *sh) >> +{ >> + switch (sh->r5c_state) { >> + case R5C_STATE_PARITY_RUN: >> + r5c_handle_parity_cached(sh); >> + break; >> + case R5C_STATE_CLEAN: >> + r5c_set_state(sh, R5C_STATE_RUNNING); > >Maybe you missed break here? I meant not to have the break here. I will revise this function (probably by a lot). > >> + case R5C_STATE_RUNNING: >> + r5c_handle_data_cached(sh); >> + break; >> + default: >> + BUG(); >> + } >> +} >> + > >BTW: there are lots of issues reported by checkpatch. I will run checkpatch and fix them. Thanks, Song ��.n��������+%������w��{.n�����{����w��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f