Re: sequential versus random I/O

Roberto Spadim <rspadim@xxxxxxxxx> · Thu, 30 Jan 2014 02:10:17 -0200

hummmm, there's a nice solution for low cost storages, it's cost less
than any ibm,dell,hp,etc storage, and have a nice read rate:
http://www.blackblaze.com/  (it's a red chassis with many disks)

maybe you could have a better performace with many sata disks running
raid1 or raid0 or raid6 or raid10, than many sas disks, considering
the same cost.... at least where i live (brazil) sas is very
expensive, and 2 sata disks is better than 1 sas disk for enterprise
databases with the same workload with many random reads (my tests
only)

with sata you buy more space and "lost" read rate (7200rpm vs
15000rpm, 2x faster), but you have more disks heads (nice for raid1
solution), this could allow more program reading different parts of
"logical volume" each head in one part (ok you must test by your self)

in other words, maybe you could save some money with many sata disks
and pay some nice cache solution: ssd disks creating a
bcache/flash/cache or raid card with flash cache, just an idea... may
others solutions could be better

2014-01-30 Matt Garman <matthew.garman@xxxxxxxxx>:
> On Wed, Jan 29, 2014 at 8:38 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
>> If your workflow description is accurate, and assuming you're trying to
>> fix a bottleneck at the NFS server, the solution to this is simple, and
>> very well known:  local scratch space.  Given your workflow description
>> it's odd that you're not already doing so.  Which leads me to believe
>> that the description isn't entirely accurate.  If it is, you simply copy
>> each file to local scratch disk and iterate over it locally.  If you're
>> using diskless compute nodes then that's an architectural
>> flaw/oversight, as this workload as described begs for scratch disk.
>
> There really is no bottleneck now, but looking into the future, there
> will be a bottleneck at the next addition of compute nodes.  I've
> thought about local caching at the compute node level, but I don't
> think it will help.  The total collection of big files on the NFS
> server is upwards of 20 TB.  Processes are distributed randomly across
> compute nodes, and any process could access any part of that 20 TB
> file collection.  (My description may have implied there is a 1-to-1
> process-to-file mapping, but that is not the case.)  So the local
> scratch space would have to be quite big to prevent thrashing.  In
> other words, unless the local cache was multi-terrabyte in size, I'm
> quite confident that the local cache would actually degrade
> performance due to constant turnover.
>
> Furthermore, let's simplify the workflow: say there is only one
> compute server, and it's local disk is sufficiently large to hold the
> entire data set (assume 20 TB drives exist with performance
> characteristics similar to today's spinning drives).  In other words,
> there is no need for the NFS server now.  I believe even in this
> scenario, the single local disk would be a bottleneck to the dozens of
> programs running on the node... these compute nodes are typically dual
> socket, 6 or 8 core systems.  The computational part is fast enough on
> modern CPUs that the I/O workload can be realistically approximated by
> dozens of parallel "dd if=/random/big/file of=/dev/null" processes,
> all accessing different files from the collection.  In other words,
> very much like my contrived example of multiple parallel read
> benchmark programs.
>
> FWIW, the current NFS server is from a big iron storage vendor.  It's
> made up of 96 15k SAS drives.  A while ago we were hitting a
> bottleneck on the spinning disks, so the vendor was happy to sell us 1
> TB of their very expensive SSD cache module.  This worked quite well
> at reducing spinning disk utilization, and cache module utilization
> was quite high.  The recent compute node expansion has lowered cache
> utilization at the expense of spinning disk utilization... things are
> still chugging along acceptably, but we're at capacity.  We've maxed
> out at just under 3 GB/sec of throughput (that's gigabytes, not bits).
>
> What I'm trying to do is decide if we should continue to pay expensive
> maintenance and additional cache upgrades to our current device, or if
> I might be better served by a DIY big array of consumer SSDs, ala the
> "Dirt Cheap Data Warehouse" [1].  I don't see too many people building
> big arrays of consumer-grade SSDs, or even vendors selling pre-made
> big SSD based systems.  (To be fair, you can buy big SSD arrays, but
> with crazy-expensive *enterprise* SSD... we have effectively a WORM
> workload, so don't need the write endurance features of enterprise
> SSD.  I think that's where the value opportunity comes in for us.)
> Anyway, I'm just looking for reasons why taking on such a project
> might blow up in my face (assuming I can convince the check-writer to
> basically fund a storage R&D project).
>
>
> [1] http://www.openida.com/the-dirt-cheap-data-warehouse-an-introduction/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Roberto Spadim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html