RE: [Non-DoD Source] Re: Showing my ignorance - kernel workers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I made the BIOS settings identical and rebooted, but same results and as an FYI on APIC   "ProcX2Apic": "ForceEnabled" - I don’t set it, it is part of the HPC workload profile...

I have an IRQ top script and I don't see anything troubling with the interrupts.    I do set rq_affinity to 2 for every block device I care about in the system (nvme's and md's)...

Am open to suggestions and if not, I'd like to formally ask for the ability to pin the kernel workers to appropriate NUMA nodes or possibly even specific CPUs.

-----Original Message-----
From: Finlayson, James M CIV (USA) 
Sent: Wednesday, January 26, 2022 5:44 PM
To: 'Jeff Johnson' <jeff.johnson@xxxxxxxxxxxxxxxxx>; 'linux-raid@xxxxxxxxxxxxxxx' <linux-raid@xxxxxxxxxxxxxxx>
Cc: Finlayson, James M CIV (USA) <james.m.finlayson4.civ@xxxxxxxx>
Subject: Re: [Non-DoD Source] Re: Showing my ignorance - kernel workers

The BIOS settings have drifted a bit because of some guidance from an HPE/AMD engineer.    If these are relevant, you can tell me to go away for a day and I'll change and reask, but what we've been messing with is turning off all of the power saving modes on the ROME to maximize performance. These are the only diffs beyond serial number.   I apologize for the drift.   I'm usually better than this :)


<           "CStateEfficiencyMode": "Disabled",
---
>           "CStateEfficiencyMode": "Enabled",
48c48
<           "DataFabricCStateEnable": "Disabled",
---
>           "DataFabricCStateEnable": "Auto",
98c98
<           "MinProcIdlePower": "C6",
---
>           "MinProcIdlePower": "NoCStates",
224c224
<           "ThermalConfig": "OptimalCooling",
---
>           "ThermalConfig": "EnhancedCPUCooling",
255,256c255,256
<           "WorkloadProfile": "Custom",
<           "XGMIForceLinkWidth": "x16",
---
>           "WorkloadProfile": "HighPerformanceCompute(HPC)",
>           "XGMIForceLinkWidth": "Auto",

-----Original Message-----
From: Finlayson, James M CIV (USA) 
Sent: Wednesday, January 26, 2022 5:01 PM
To: 'Jeff Johnson' <jeff.johnson@xxxxxxxxxxxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx
Subject: RE: [URL Verdict: Neutral]Re: [Non-DoD Source] Re: Showing my ignorance - kernel workers

I will verify, but I'm pretty sure they are still sitting with the same bios - I did an hpe ilo get of the one bios and pushed it to the other once I saw good individual SSD performance with FIO.   I'm always fearful going out to these lists because there is much more that I don't know than I do, but at least my problems are different than "I'm running all of these desktop drives on a system with non-ECC memory and I just lost all of my movies, can you help me :)"

-----Original Message-----
From: Jeff Johnson <jeff.johnson@xxxxxxxxxxxxxxxxx> 
Sent: Wednesday, January 26, 2022 3:53 PM
To: linux-raid@xxxxxxxxxxxxxxx
Cc: Finlayson, James M CIV (USA) <james.m.finlayson4.civ@xxxxxxxx>
Subject: [URL Verdict: Neutral]Re: [Non-DoD Source] Re: Showing my ignorance - kernel workers

All active links contained in this email were disabled.  Please verify the identity of the sender, and confirm the authenticity of all links contained within the message prior to copying and pasting the address to a Web browser.  




----

It might be worthwhile to check the BIOS settings on the two Rome servers to make sure the settings match, paying particular attention to NUMA and ioapic settings.

Background: Caution-https://developer.amd.com/wp-content/resources/56745_0.80.pdf

--Jeff

On Wed, Jan 26, 2022 at 12:40 PM Finlayson, James M CIV (USA) <james.m.finlayson4.civ@xxxxxxxx> wrote:
>
> Both dual socket AMD Romes.   Identical in every way.   NUMAs per socket set to 1 in the BIOS.   I'm using the exact same 10 drives on each system and they are PCIe Gen4 HPE OEM of SAMSUNG....
>
> -----Original Message-----
> From: Jani Partanen <jiipee@xxxxxxxxxxx>
> Sent: Wednesday, January 26, 2022 3:32 PM
> To: Finlayson, James M CIV (USA) <james.m.finlayson4.civ@xxxxxxxx>; 
> linux-raid@xxxxxxxxxxxxxxx
> Subject: [Non-DoD Source] Re: Showing my ignorance - kernel workers
>
> Hello, are both systems identical what comes to hardware? Mainly mobo.
>
> If no and they are dual socket systems, then it may be that one of the systems is designed to route all PCI-e via one socket so that all drive slots can be used just 1 socked populated. And another is designed so taht only half of the drive slots works when only 1 socket is populated.
> At least I have read something like this previously from this list.
>
> // JiiPee
>
>
> Finlayson, James M CIV (USA) kirjoitti 26/01/2022 klo 22.17:
> > I apologize in advance if you can point me to something I can read about mdraid besides the source code.  I'm beyond the bounds of my understanding of Linux.   Background, I do a bunch of NUMA aware computing.   I have two systems configured identically with a NUMA node 0 focused RAID5 LUN containing NUMA node 0 nvme drives  and a NUMA node 1 focused RAID5 LUN identically configured.  9+1 nvme, 128KB stripe, xfs sitting on top, 64KB O_DIRECT reads from the application.
>


--
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.johnson@xxxxxxxxxxxxxxxxx
Caution-www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite C - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux