Hi, Please find answers below. On Tue, Jun 23, 2020 at 2:06 PM Alexander Murashkin <AlexanderMurashkin@xxxxxxx> wrote: > > Dear MD, > > After reading md documentation describing stripe_cache_size and journal > (cache) in write-back mode, I found some inconsistencies > > - sometimes the documentation states that the cache is for RAID 4/5/6, > sometimes just for RAID 5 In most of the cases, RAID 4/5/6 is the same as RAID 5. > - it is nothing explicitly said about the cache block device size and > how one is related to the memory cache size These two caches are independent. In-memory stripe cache is needed for parity calculation.It also serves as read cache. The block write cache is used to protect data during power loss. We never read the write cache during normal read/write. > - it is stated that the memory cache <includes> the same data stored on > cache disk - that is somewhat ambiguous Since we don't read the block write cache during normal read/write, we will not drop and data from in-memory stripe cache until we don't need it in the near future. > - stripe_cache_size is the number of pages per device, but it is also > called the number of entries > > Here are some statements about the journal. Could somebody confirm that > they are true (or not)? > > - the journal and all its features can be used with md RAID 4/5/6 True. > - references to RAID 5 only are wrong (in regards to the journal) True. > - cache block device size in bytes shall be the same as memory cache > size in bytes False, they are not related. > - any extra block device or memory space (larger than the minimum of > cache block device and memory cache sizes) is not used Only a fraction of the journal device contains useful data. Once the data is fully committed to the raid disks, the copy in the journal device is not considered useful. > - the cache block device and the memory cache contain the same data They don't contain identical sets of data. But they may contain two copies of the same data. > - the cache entry is exactly one page (so the number of pages and the > number of entries are the same) Each entry is one page per raid disks. For a RAID 5 with 4 disks on x86_64 system, each stripe cache entry is 4 pages (4kB x4). > > Below are few extracts from the related documentation, for your convenience. > > md(4) > ==== > > md/stripe_cache_size > This is only available on RAID5 and RAID6. It records the size > (in pages per device) of the stripe cache which > is used for synchronising all write operations to the array and > all read operations if the array is degraded. > memory_consumed = system_page_size * nr_disks * stripe_cache_size > > https://www.kernel.org/doc/Documentation/md/raid5-cache.txt > ======================================= > > RAID5 cache > ------------------ > > Raid 4/5/6 could include an extra disk for data cache... > > write-back mode: > ------------------------ > > Write-back cache will aggregate the data and flush the data to RAID > disks only after the data becomes a full stripe write... > In write-back mode, MD also caches data in memory. The memory cache > includes the same data stored on cache disk, ... > A user can configure the size by: echo "2048" > > /sys/block/md0/md/stripe_cache_size > > The implementation: > ----------------------------- > > In write-back mode, MD writes IO data to the log and reports IO > completion. The data is also fully cached in memory at that > time, which means read must query memory cache. If some conditions are > met, MD will flush the data to RAID disks > ... MD will write both data and parity into RAID disks, then MD can > release the memory cache. The flush conditions could be > stripe becomes a full stripe write, free cache disk space is low or free > in-kernel memory cache space is low. > > https://www.kernel.org/doc/html/latest/admin-guide/md.html > ====================================== > > stripe_cache_size (currently raid5 only) > number of entries in the stripe cache...