Re: performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Strahil,

thanks again for sticking with me on this.
Hm...  OK. I guess you can try 7.7 whenever it's possible.

Acknowledged.

Perhaps I am not understanding it correctly.  I tried these suggestions

before and it got worse, not better.  so I have been operating under
the
assumption that maybe these guidelines are not appropriate for newer
versions.

Actually, the  settings are not changed  much,  so they should work for you.

Okay, then maybe I am doing something incorrectly, or not understanding some fundamental piece of things that I should be.

Interestingly, mostly because it is not something I have ever
experienced before, software interrupts sit between 1 and 5 on each
core, but the last core is usually sitting around 20.  Have never
encountered a high load average where the si number was ever
significant.  I have googled the crap out of that (as well as
gluster
performance in general), there are nearly limitless posts about what
it

is, but have yet to see one thing to explain what to do about it.

This is happening on all nodes ?
I got a similar situation caused by bad NIC  (si in top was way high), but the chance for bad NIC on all servers is very low.
You can still patch OS + Firmware on your next maintenance.

Yes, but it's not to the same extreme. The other node is currently not actually serving anything to the internet, so right now it's only function is replicated gluster and databases. On the 2nd node there is also one core, the first one in this case as opposed to the last one on the main node, but it sits between 10 and 15 instead of 20 and 25, and the remaining cores will be between 0 and 2 instead of 1 and 5.

I have no evidence of any bad hardware, and these servers were both commissioned only within the last couple of months. But will still poke around on this path.

more number of CPU cycles than needed, increasing the event thread
count
would enhance the performance of the Red Hat Storage Server."  which is

why I had it at 8.

Yeah, but you got only 6 cores  and they are not dedicated for gluster only. I think that you need to test with lower values.

Okay, I will change these values a few times over the next couple of hours and see what happens.

right now the only suggested parameter I haven't played with is the
performance.io-thread-count, which I currently have at 64.

I think that as you have SSDs only,  you might have some results by changing this one.

Okay, will also modify this incrementally. do you think it can go higher? I think I got this number from a thread on this list, but I am not really sure what would be a reasonable value for my system.


For what it's worth, I am running ext4 as my underlying fs and I have
read a few times that XFS might have been a better choice.  But that is

not a trivial experiment to make at this time with the system in
production.  It's one thing (and still a bad thing to be sure) to
semi-bork the system for an hour or two while I play with
configurations, but would take a day or so offline to reformat and
restore the data.

XFS  should bring better performance, but if the issue is not in FS ->  it won't make  a change...
What I/O scheduler are you using for the SSDs (you can check via 'cat /sys/block/sdX/queue/scheduler)?

# cat /sys/block/vda/queue/scheduler
[mq-deadline] none

in the past I have tried 2, 4, 8, 16, and 32.  Playing with just those
I
never noticed that any of them made any difference.  Though I might
have
some different options now than I did then, so might try these again
throughout the day...

Are you talking about server or client event threads (or both)?

It never occurred to me to set them to different values. so far when I set one I set the other to the same value.


Thanks again for your time Strahil, if you have any more thoughts would

love to hear them.

Can you check if you use 'noatime' for the bricks ? It won't bring any effect on the CPU side, but it might help with the I/O.

I checked into this, and I have nodiratime set, but not noatime. from what I can gather, it should provide nearly the same benefit performance wise while leaving the atime attribute on the files. Never know, I may decide I want those at some point in the future.

I see that your indicator for high load  is loadavg,  but have you actually checked how many processes are in 'R' or 'D' state ?
Some  monitoring checks can raise loadavg artificially.

occasionally a batch of processes will be in R state, and I see the D state show up from time to time, but mostly everything is S.

Also,  are you using software mirroring (either mdadm or striped/mirrored LVs )?

No, single disk. And I opted to not put the gluster on a thinLVM, as I don't see myself using the lvm snapshots in this scenario.

So, we just moved into a quieter time of the day, but maybe I just stumbled onto something. I was trying to figure out if/how I could throw more RAM at the problem. gluster docs says write behind is not a cache unless flush-behind is on. So seems that is a way to throw ram to it? I put performance.write-behind-window-size: 512MB and performance.flush-behind: on and the whole system calmed down pretty much immediately. could be just timing, though, will have to see tomorrow during business hours whether the system stays at a reasonable load.

I will still test the other options you suggested tonight, though, this is probably too good to be true.

Can't thank you enough for your input, Strahil, your help is truly appreciated!









Best Regards,
Strahil Nikolov

________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux