Thank You for all the detailed explanation. If its the disk saturating, if we run some of the above mentioned tests(with multithreads) on plain xfs, we should hit the saturation right. Will try out some tests, this is interesting.
Thanks,
Poornima
On Wed, Feb 6, 2019 at 12:27 PM Xavi Hernandez <xhernandez@xxxxxxxxxx> wrote:
On Wed, Feb 6, 2019 at 7:00 AM Poornima Gurusiddaiah <pgurusid@xxxxxxxxxx> wrote:On Tue, Feb 5, 2019, 10:53 PM Xavi Hernandez <xhernandez@xxxxxxxxxx wrote:On Fri, Feb 1, 2019 at 1:51 PM Xavi Hernandez <xhernandez@xxxxxxxxxx> wrote:On Fri, Feb 1, 2019 at 1:25 PM Poornima Gurusiddaiah <pgurusid@xxxxxxxxxx> wrote:Can the threads be categorised to do certain kinds of fops?Could be, but creating multiple thread groups for different tasks is generally bad because many times you end up with lots of idle threads which waste resources and could increase contention. I think we should only differentiate threads if it's absolutely necessary.Read/write affinitise to certain set of threads, the other metadata fops to other set of threads. So we limit the read/write threads and not the metadata threads? Also if aio is enabled in the backend the threads will not be blocked on disk IO right?If we don't block the thread but we don't prevent more requests to go to the disk, then we'll probably have the same problem. Anyway, I'll try to run some tests with AIO to see if anything changes.I've run some simple tests with AIO enabled and results are not good. A simple dd takes >25% more time. Multiple parallel dd take 35% more time to complete.Thank you. That is strange! Had few questions, what tests are you running for measuring the io-threads performance(not particularly aoi)? is it dd from multiple clients?Yes, it's a bit strange. What I see is that many threads from the thread pool are active but using very little CPU. I also see an AIO thread for each brick, but its CPU usage is not big either. Wait time is always 0 (I think this is a side effect of AIO activity). However system load grows very high. I've seen around 50, while on the normal test without AIO it's stays around 20-25.Right now I'm running the tests on a single machine (no real network communication) using an NVMe disk as storage. I use a single mount point. The tests I'm running are these:
- Single dd, 128 GiB, blocks of 1MiB
- 16 parallel dd, 8 GiB per dd, blocks of 1MiB
- fio in sequential write mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file
- fio in sequential read mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file
- fio in random write mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file
- fio in random read mode, direct I/O, blocks of 128k, 16 threads, 8GiB per file
- smallfile create, 16 threads, 256 files per thread, 32 MiB per file (with one brick down, for the following test)
- self-heal of an entire brick (from the previous smallfile test)
- pgbench init phase with scale 100
I run all these tests for a replica 3 volume and a disperse 4+2 volume.XaviRegards,PoornimaXaviAll this is based on the assumption that large number of parallel read writes make the disk perf bad but not the large number of dentry and metadata ops. Is that true?It depends. If metadata is not cached, it's as bad as a read or write since it requires a disk access (a clear example of this is the bad performance of 'ls' in cold cache, which is basically metadata reads). In fact, cached data reads are also very fast, and data writes could go to the cache and be updated later in background, so I think the important point is if things are cached or not, instead of if they are data or metadata. Since we don't have this information from the user side, it's hard to tell what's better. My opinion is that we shouldn't differentiate requests of data/metadata. If metadata requests happen to be faster, then that thread will be able to handle other requests immediately, which seems good enough.However there's one thing that I would do. I would differentiate reads (data or metadata) from writes. Normally writes come from cached information that is flushed to disk at some point, so this normally happens in the background. But reads tend to be in foreground, meaning that someone (user or application) is waiting for it. So I would give preference to reads over writes. To do so effectively, we need to not saturate the backend, otherwise when we need to send a read, it will still need to wait for all pending requests to complete. If disks are not saturated, we can have the answer to the read quite fast, and then continue processing the remaining writes.Anyway, I may be wrong, since all these things depend on too many factors. I haven't done any specific tests about this. It's more like a brainstorming. As soon as I can I would like to experiment with this and get some empirical data.XaviThanks,PoornimaOn Fri, Feb 1, 2019, 5:34 PM Emmanuel Dreyfus <manu@xxxxxxxxxx wrote:On Thu, Jan 31, 2019 at 10:53:48PM -0800, Vijay Bellur wrote:
> Perhaps we could throttle both aspects - number of I/O requests per disk
While there it would be nice to detect and report a disk with lower than
peer performance: that happen sometimes when a disk is dying, and last
time I was hit by that performance problem, I had a hard time finding
the culprit.
--
Emmanuel Dreyfus
manu@xxxxxxxxxx
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-devel