On 2024-10-18, 02:08, "K Prateek Nayak" <kprateek.nayak@xxxxxxx> wrote: > Most of our testing used sysbench as the benchmark driver. How does > mysql+hammerdb work specifically? Do the tasks driving the request are > located on a separate server or are co-located with the benchmarks > threads on the same server? The hammerdb test is a bit more complex than sysbench. It uses two independent physical machines to perform a TPC-C derived test [1], aiming to simulate a real-world database workload. The machines are allocated as an AWS EC2 instance pair on the same cluster placement group [2], to avoid measuring network bottlenecks instead of server performance. The SUT instance runs mysql configured to use 2 worker threads per vCPU (32 total); the load generator instance runs hammerdb configured with 64 virtual users and 24 warehouses [3]. Each test consists of multiple 20-minute rounds, run consecutively on multiple independent instance pairs. [1] https://www.tpc.org/tpcc/default5.asp [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-strategies.html [3] https://hammerdb.com/docs/ch03s05.html > Did you see any glaring changes in scheduler statistics with the > introduction of EEVDF in v6.6? EEVDF commits up till v6.9 were easy to > revert from my experience but I've not tried it on v6.12-rcX with the > EEVDF complete series. Is all the regression seen purely > attributable to EEVDF alone on the more recent kernels? Yes, the regression is attributable to EEVDF: After seeing indications that there was a performance degradation somewhere after kernel 6.5, bisect testing narrowed the same degradation to merge commit b41bbb33cf75 (Merge branch 'sched/eevdf' into sched/core). Expanding testing to all stable kernel versions next (6.6 through 6.11) showed very similar performance data, confirming that non-EEDVF changes introduced along the way do not have any significant impact. Testing kernel 6.12 at various stages, starting with commit 2004cef11ea0 (Merge tag 'sched-core-2024-09-19') and continuing with the v6.12-rcX tags as they became available, shows a different performance profile than previous kernels: the degradation is smaller than 6.6 through 6.11, but the positive impact from disabling PLACE_LAG and RUN_TO_PARITY is also smaller. However, after testing a fractional factorial of combinations for all EEVDF-specific features, the only configuration that yielded better performance than NO_PLACE_LAG+NO_RUN_TO_PARITY was with all 7 features disabled (NO_PLACE_LAG, NO_RUN_TO_PARITY, NO_DELAY_DEQUEUE, NO_DELAY_ZERO, NO_PLACE_DEADLINE_INITIAL, NO_PLACE_REL_DEADLINE, NO_PREEMPT_SHORT). After considering the potential impact on other workloads and the ease of backporting for the two best options, NO_PLACE_LAG+NO_RUN_TO_PARITY seemed like the better choice for 6.12 as well. Looking at the comparative aperf [4] reports showed no diverging configuration issues, and no noticeable differences in the PMU stats (which confirms there are no unrelated system differences affecting the results). [4] https://github.com/aws/aperf >> I haven't tested with SCHED_BATCH yet, will update the thread with results >> as they accumulate Testing with SCHED_BATCH (and default scheduler settings) resulted in no significant performance change. As an additional data point, using SCHED_FIFO or SCHED_RR further degraded the mysql performance (but improved postgresql). > Could you also test running with: > echo NO_WAKEUP_PREEMPTION > /sys/kernel/debug/sched/features Certainly; will update the thread when the results are available. > On a side note, what is the CONFIG_HZ and the > preemption model on your test kernel (most of my testing was with > CONFIG+HZ=250, voluntary preemption) CONFIG_HZ was 250. Testing with other values did not reveal anything relevant to this regression either: both CFS and EEVDF had a slight improvement with CONFIG_HZ=100, and no change otherwise. Preemption was the default (voluntary) for all tests. > The data in the latter link helped root-cause the actual issue with the > algorithm that the benchmark disliked. Similar information for the > database benchmarks you are running, can help narrow down the issue. Thank you for the links! I'll gladly continue gathering data and help diagnose this issue. I am concerned, however, with keeping the default configuration the way it currently is while the investigation continues. Do you happen to know how the reported blogbench performance compares to the pre-EEVDF (v6.5) results? > From what I can tell, your benchmark has a set of threads that like to > get cpu time as fast as possible. With EEVDF Complete (I would recommend > using current tip:sched/urgent branch to test them out) setting a more > aggressive nice value to these threads should enable them to negate the > effect of RUN_TO_PARITY thanks to PREEMPT_SHORT. > > As for NO_PLACE_LAG, the DELAY_DEQUEUE feature should help task shed off > any lag it has built up and should very likely start from the zero-lag > point unless it is a very short sleeper. Agree with the thread assessment. It seems that the best outcome is when the threads run as fast as possible, with overhead as small as possible. I'll test with EEVDF complete as well. Note that both DELAY_DEQUEUE and PREEMPT_SHORT were part of the combinations in the test suite on commit 2004cef11ea0, as mentioned above, and did not effect dramatic performance changes when flipped. At that time (and this is no longer true in v6.12-rc2) NO_DELAY_ZERO was also needed along with NO_PLACE_LAG and NO_RUN_TO_PARITY. > Is there any reason to flip it very early into the boot? Have you seen > anything go awry with system processes during boot with EEVDF? I haven't, as this benchmarking is specifically measuring the stable state of a system. The boot order argument was only in the context of discussing the suitability of rc.local as compared to sysctl for persisting scheduler options. It's conceivable that options which are different in a stable state than at startup could lead to process management outcomes which affect performance (and are harder to reproduce scenarios).