Re: [PATCH RFC v2 2/8] md/raid5: add a member of r5pages for struct stripe_head

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 21, 2020 at 5:40 AM Yufen Yu <yuyufen@xxxxxxxxxx> wrote:
>
> Since grow_buffers() uses alloc_page() allocate the buffers for each
> stripe_head(), means, it will allocate 64K buffers and just use 4K
> of them, after setting STRIPE_SIZE as 4096.
>
> To avoid waste memory, we try to contain multiple 'page' of sh->dev
> into one real page. That means, multiple sh->dev[i].page will point to
> the only page with different offset. Example of 64K PAGE_SIZE and
> 4K STRIPE_SIZE as following:
>
>                     64K PAGE_SIZE
>           +---+---+---+---+------------------------------+
>           |   |   |   |   |
>           |   |   |   |   |
>           +-+-+-+-+-+-+-+-+------------------------------+
>             ^   ^   ^   ^
>             |   |   |   +----------------------------+
>             |   |   |                                |
>             |   |   +-------------------+            |
>             |   |                       |            |
>             |   +----------+            |            |
>             |              |            |            |
>             +-+            |            |            |
>               |            |            |            |
>         +-----+-----+------+-----+------+-----+------+------+
> sh      | offset(0) | offset(4K) | offset(8K) | offset(16K) |
>  +      +-----------+------------+------------+-------------+
>  +----> dev[0].page  dev[1].page  dev[2].page  dev[3].page
>
> After compressing pages, the users of sh->dev[i].page need to be take care:
>
>   1) When issue bio of stripe_head, bi_io_vec.bv_page will point to
>      the page directly. So, we should make sure bv_offset been set with
>      correct offset.
>
>   2) When compute xor, the page will be passed to computer function.
>      So, we also need to pass offset of that page to computer. Let it
>      compute correct location of each sh->dev[i].page.
>
> This patch will add a new member of r5pages in stripe_head to manage
> all pages needed by each sh->dev[i]. We also add 'offset' for each r5dev
> so that users can get related page offset easily. And add helper function
> to get page and it's index in r5pages array by disk index. This is patch
> in preparation for fallowing changes.
>
> Signed-off-by: Yufen Yu <yuyufen@xxxxxxxxxx>
> ---
>  drivers/md/raid5.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
>
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index b0010991bdc8..6296578e9585 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -246,6 +246,13 @@ struct stripe_head {
>                 int                  target, target2;
>                 enum sum_check_flags zero_sum_result;
>         } ops;
> +
> +       /* These pages will be used by bios in dev[i] */
> +       struct r5pages {
> +               struct page     **page;
> +               int     size; /* page array size */

Do we really need variable size for each stripe?

Thanks,
Song

PS: I like the overall idea of 2/8 and 3/8. Will continue reviewing later.


> +       } pages;
> +
>         struct r5dev {
>                 /* rreq and rvec are used for the replacement device when
>                  * writing data to both devices.
> @@ -253,6 +260,7 @@ struct stripe_head {
>                 struct bio      req, rreq;
>                 struct bio_vec  vec, rvec;
>                 struct page     *page, *orig_page;
> +               unsigned int    offset;         /* offset of this page */
>                 struct bio      *toread, *read, *towrite, *written;
>                 sector_t        sector;                 /* sector of this page */
>                 unsigned long   flags;
> @@ -754,6 +762,54 @@ static inline int algorithm_is_DDF(int layout)
>         return layout >= 8 && layout <= 10;
>  }
>
> +/*
> + * Return corresponding page index of r5pages array.
> + */
> +static inline int raid5_get_page_index(struct stripe_head *sh, int disk_idx)
> +{
> +       WARN_ON(!sh->pages.page);
> +       if (disk_idx >= sh->raid_conf->pool_size)
> +               return -ENOENT;
> +
> +       return (disk_idx * STRIPE_SIZE) / PAGE_SIZE;
> +}
> +
> +/*
> + * Return offset of the corresponding page for r5dev.
> + */
> +static inline int raid5_get_page_offset(struct stripe_head *sh, int disk_idx)
> +{
> +       WARN_ON(!sh->pages.page);
> +       if (disk_idx >= sh->raid_conf->pool_size)
> +               return -ENOENT;
> +
> +       return (disk_idx * STRIPE_SIZE) % PAGE_SIZE;
> +}
> +
> +/*
> + * Return corresponding page address for r5dev.
> + */
> +static inline struct page *
> +raid5_get_dev_page(struct stripe_head *sh, int disk_idx)
> +{
> +       int idx;
> +
> +       WARN_ON(!sh->pages.page);
> +       idx = raid5_get_page_index(sh, disk_idx);
> +       return sh->pages.page[idx];
> +}
> +
> +/*
> + * We want to 'compress' multiple buffers to one real page for
> + * stripe_head when PAGE_SIZE is biggger than STRIPE_SIZE. If their
> + * values are equal, no need to use this strategy. For now, it just
> + * support raid level less than 5.
> + */
> +static inline int raid5_compress_stripe_pages(struct r5conf *conf)
> +{
> +       return (PAGE_SIZE > STRIPE_SIZE) && (conf->level < 6);
> +}
> +
>  extern void md_raid5_kick_device(struct r5conf *conf);
>  extern int raid5_set_cache_size(struct mddev *mddev, int size);
>  extern sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous);
> --
> 2.21.1
>



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux