Re: Best way to add caching to a new raid setup.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It is simpler, and has very simple to maintain moving parts.  I have
been a linux admin for 20+ years, and a professional unix admin for
longer, and too often complicated seems nice but has burned me with
bugs and other unexpected results, so simple is best.  The daily move
uses nothing complicated and can be expected to work on any unix
system that has ever existed and relies on heavily used operations
that have a high probability of working and of being caught quickly as
broken if they did not work.  Any of the others are a bit more
complicated, and more likely to have bugs and less likely to get
caught as quick as the moving parts I rely on.  I also wanted to be
able to spin down my array for any hours when no one is watching the
dvr (usually this is 18+ hours per day, x 7 drives ==   1.25kw/day, or
37kw/month, or $4-$10 depending on power costs), and I also have
motion software collecting security cams that go to the SSD and are
also copied onto the array nighty.   The security cams would have kept
the array spinning when anything moved anywhere outside so pretty much
100% of the time.

On Fri, Sep 11, 2020 at 1:39 PM R. Ramesh <rramesh@xxxxxxxxxxx> wrote:
>
> On 8/30/20 10:42 AM, Roger Heflin wrote:
> > The LSI should be a good controller as long as you the HBA fw and not
> > the raid fw.
> >
> > I use an LSI with hba + the 8 AMD chipset sata ports, currently I have
> > 12 ports cabled to hot swap bays but only 7+boot disk used.
> >
> > How many recording do you think you will have and how many
> > clients/watchers?  With the SSD handling the writes for recording my
> > disks actually spin down if no one is watching anything.
> >
> > The other trick the partitions let me do is initially I moved from 1.5
> > -> 3tb disks (2x750 -> 4x750) and once I got 3-3tbs in I added the 2
> > more partitions raid6(+1.5TB) (I bought the 3tb drives slowly), then
> > the next 3tb gets added to all 4 partitions (+3TB).
> >
> > On reads at least each disk can do at least 50 iops, and for the most
> > part the disks themselves are very likely to cache the entire track
> > the head goes over, so a 2nd sequential read likely comes from the
> > disk's read cache and does not have to actually be read.  So several
> > sequential workloads jumping back and forth do not behave as bad as
> > one would expect.  Write are a different story and a lot more
> > expensive.  I isloate those to ssd and copy them in the middle of the
> > night when it is low activity.  And since they are being copied as big
> > fast streams one file at a time they end up with very few fragments
> > and write very quickly.   The way I have mine setup mythtv will find
> > the file whether it is on the ssd recording directory or the raid
> > recording directory, so when I mv the files nothing has to be done
> > except the mv.
> >
> >
> > On Sat, Aug 29, 2020 at 7:56 PM Ram Ramesh <rramesh2400@xxxxxxxxx> wrote:
> >> On 8/29/20 4:26 PM, Roger Heflin wrote:
> >>> It should be worth noting that if you buy 2 exactly the same SSD's at
> >>> the same time and use them in a mirror they are very likely to be
> >>> wearing about the same.
> >>>
> >>> I am hesitant to go much bigger on disks, especially since the $$/GB
> >>> really does not change much as the disks get bigger.
> >>>
> >>> And be careful of adding on a cheap sata controller as a lot of them work badly.
> >>>
> >>> Most of my disks have died from bad blocks causing a section of the
> >>> disk to have some errors, or bad blocks on sections causing the array
> >>> to pause for 7 seconds.  Make sure to get a disk with SCTERC settable
> >>> (timeout when bad blocks happen, otherwise the default timeout is a
> >>> 60-120seconds, but with it you can set it to no more than 7 seconds).
> >>>    In the cases where the entire disk did not just stop and is just
> >>> getting bad blocks in places, typically you have time as only a single
> >>> section is getting bad blocks, so in this case having sections does
> >>> help.    Also note that mdadm with 4 sections like I have will only
> >>> run a single rebuild at a time as mdadm understands that the
> >>> underlying disks are shared, this makes replacing a disk with 1
> >>> section or 4 sections basically work pretty much the same.  It does
> >>> the same thing on the weekly scans, it sets all 4 to scan, and it
> >>> scans 1 and defers the other scan as disks are shared.
> >>>
> >>> It seems to be a disk completely dying is a lot less often than badblock issues.
> >>>
> >>> On Sat, Aug 29, 2020 at 3:50 PM Ram Ramesh <rramesh2400@xxxxxxxxx> wrote:
> >>>> On 8/29/20 12:02 AM, Roman Mamedov wrote:
> >>>>> On Fri, 28 Aug 2020 22:08:22 -0500
> >>>>> "R. Ramesh" <rramesh@xxxxxxxxxxx> wrote:
> >>>>>
> >>>>>> I do not know how SSD caching is implemented. I assumed it will be
> >>>>>> somewhat similar to memory cache (L2 vs L3 vs L4 etc). I am hoping that
> >>>>>> with SSD caching, reads/writes to disk will be larger in size and
> >>>>>> sequential within a file (similar to cache line fill in memory cache
> >>>>>> which results in memory bursts that are efficient). I thought that is
> >>>>>> what SSD caching will do to disk reads/writes. I assumed, once reads
> >>>>>> (ahead) and writes (assuming writeback cache) buffers data sufficiently
> >>>>>> in the SSD, all reads/writes will be to SSD with periodic well organized
> >>>>>> large transfers to disk. If I am wrong here then I do not see any point
> >>>>>> in SSD as a cache. My aim is not to optimize by cache hits, but optimize
> >>>>>> by preventing disks from thrashing back and forth seeking after every
> >>>>>> block read. I suppose Linux (memory) buffer cache alleviates some of
> >>>>>> that. I was hoping SSD will provide next level. If not, I am off in my
> >>>>>> understanding of SSD as a disk cache.
> >>>>> Just try it, as I said before with LVM it is easy to remove if it doesn't work
> >>>>> out. You can always go to the manual copying method or whatnot, but first why
> >>>>> not check if the automatic caching solution might be "good enough" for your
> >>>>> needs.
> >>>>>
> >>>>> Yes it usually tries to avoid caching long sequential reads or writes, but
> >>>>> there's also quite a bit of other load on the FS, i.e. metadata. I found that
> >>>>> browsing directories and especially mounting the filesystem had a great
> >>>>> benefit from caching.
> >>>>>
> >>>>> You are correct that it will try to increase performance via writeback
> >>>>> caching, however with LVM that needs to be enabled explicitly:
> >>>>> https://www.systutorials.com/docs/linux/man/7-lvmcache/#lbAK
> >>>>> And of course a failure of that cache SSD will mean losing some data, even if
> >>>>> the main array is RAID. Perhaps should consider a RAID of SSDs for cache in
> >>>>> that case then.
> >>>>>
> >>>> Yes, I have 2x500GB ssds for cache. May be, I should do raid1 on them
> >>>> and use as cache volume.
> >>>> I thought SSDs are more reliable and even when they begin to die, they
> >>>> become readonly before quitting.  Of course, this is all theory, and I
> >>>> do not think standards exists on how they behave when reaching EoL.
> >>>>
> >>>> Ramesh
> >>>>
> >> My SSDs are from different companies and bought at different times
> >> (2019/2016, I think).
> >>
> >> I have not had many hard disk failures. However, each time I had one, it
> >> has been a total death. So, I am a bit biased. May be with sections, I
> >> can replace one md at a time and letting others run degraded. I am sure
> >> there other tricks. I am simply saying it is a lot of reads/writes, and
> >> of course computation, in cold replacement of disks in RAID6 vs. RAID1.
> >>
> >> Yes, larger disks are not cheaper, but they use one SATA port vs.
> >> smaller disks. Also, they use less power in the long run (mine run
> >> 24x7). That is why I have a policy of replacing disks once 2x size disks
> >> (compared to what I currently own) become commonplace.
> >>
> >> I have a LSI 9211 SAS HBA which is touted to be reliable by this community.
> >>
> >> Regards
> >> Ramesh
> >>
>
> Roger,
>
>    Just curious, in your search for a SSD solution to mythtv recording,
> did you consider overlayfs, unionfs or mergerfs? If you did, why did you
> decide that a simple copy is better?
>
> Ramesh
>



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux