Re: Best way to add caching to a new raid setup.

Ram Ramesh <rramesh2400@xxxxxxxxx> · Sat, 29 Aug 2020 19:56:04 -0500

On 8/29/20 4:26 PM, Roger Heflin wrote:
It should be worth noting that if you buy 2 exactly the same SSD's at
the same time and use them in a mirror they are very likely to be
wearing about the same.

I am hesitant to go much bigger on disks, especially since the $$/GB
really does not change much as the disks get bigger.

And be careful of adding on a cheap sata controller as a lot of them work badly.

Most of my disks have died from bad blocks causing a section of the
disk to have some errors, or bad blocks on sections causing the array
to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
(timeout when bad blocks happen, otherwise the default timeout is a
60-120seconds, but with it you can set it to no more than 7 seconds).
  In the cases where the entire disk did not just stop and is just
getting bad blocks in places, typically you have time as only a single
section is getting bad blocks, so in this case having sections does
help.    Also note that mdadm with 4 sections like I have will only
run a single rebuild at a time as mdadm understands that the
underlying disks are shared, this makes replacing a disk with 1
section or 4 sections basically work pretty much the same.  It does
the same thing on the weekly scans, it sets all 4 to scan, and it
scans 1 and defers the other scan as disks are shared.

It seems to be a disk completely dying is a lot less often than badblock issues.

On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@xxxxxxxxx> wrote:
On 8/29/20 12:02 AM, Roman Mamedov wrote:
On Fri, 28 Aug 2020 22:08:22 -0500
"R. Ramesh" <rramesh@xxxxxxxxxxx> wrote:

I do not know how SSD caching is implemented. I assumed it will be
somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
with SSD caching, reads/writes to disk will be larger in size and
sequential within a file (similar to cache line fill in memory cache
which results in memory bursts that are efficient). I thought that is
what SSD caching will do to disk reads/writes. I assumed, once reads
(ahead) and writes (assuming writeback cache) buffers data sufficiently
in the SSD, all reads/writes will be to SSD with periodic well organized
large transfers to disk. If I am wrong here then I do not see any point
in SSD as a cache. My aim is not to optimize by cache hits, but optimize
by preventing disks from thrashing back and forth seeking after every
block read. I suppose Linux (memory) buffer cache alleviates some of
that. I was hoping SSD will provide next level. If not, I am off in my
understanding of SSD as a disk cache.
Just try it, as I said before with LVM it is easy to remove if it doesn't work
out. You can always go to the manual copying method or whatnot, but first why
not check if the automatic caching solution might be "good enough" for your
needs.

Yes it usually tries to avoid caching long sequential reads or writes, but
there's also quite a bit of other load on the FS, i.e. metadata. I found that
browsing directories and especially mounting the filesystem had a great
benefit from caching.

You are correct that it will try to increase performance via writeback
caching, however with LVM that needs to be enabled explicitly:
https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
And of course a failure of that cache SSD will mean losing some data, even if
the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
that case then.

Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
and use as cache volume.
I thought SSDs are more reliable and even when they begin to die, they
become readonly before quitting.  Of course, this is all theory, and I
do not think standards exists on how they behave when reaching EoL.

Ramesh

My SSDs are from different companies and bought at different times 
(2019/2016, I think).

I have not had many hard disk failures. However, each time I had one, it 
has been a total death. So, I am a bit biased. May be with sections, I 
can replace one md at a time and letting others run degraded. I am sure 
there other tricks. I am simply saying it is a lot of reads/writes, and 
of course computation, in cold replacement of disks in RAID6 vs. RAID1.

Yes, larger disks are not cheaper, but they use one SATA port vs. 
smaller disks. Also, they use less power in the long run (mine run 
24x7). That is why I have a policy of replacing disks once 2x size disks 
(compared to what I currently own) become commonplace.

I have a LSI 9211 SAS HBA which is touted to be reliable by this community.

Regards
Ramesh