Thanks for your answer, Nellans! I think 31k IOPS should be because the RAID card cache. Seems we have 394MB cache in this controller, which is enough hold the 2 * 128M data. # MegaCli64 -AdpAllInfo -aALL | grep -i cache Cache Flush Interval : 4s Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 394 MB Block SSD Write Disk Cache Change: No Disk Cache Policy : Yes Cache When BBU Bad : Disabled Cached IO : No During my test, fio first created the 2 128M files, and then it should clear the file system cache, but the data still be in controller cache. I tried again with 2 jobs and 1280M size, the IOPS is just around 400... But I am so pool in understanding many hardware terms, and I can't tell you what raid level, how many disk in this raid groups. So I just pasted the output of MegaCli for pv, lv, raid controller information in the following. Can you tell me what does "enclosure device" is? usually how many different type of enclosure a raid controller should have ? ( I thought all the device hardware should only be a raid controller with many slots, every disk can be installed on a slots. ) And from the following output, how many disk I have, and what type of raid I am using ? does "RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0" means raid 1? and "raid level : primary-1, secondary-3, raid level qualifier-0 " means raid10? Why "-3" is 0? Thanks you again for your time. [root@lvs2b1c-93cb linuxTool]# MegaCli64 -PDList -aALL Adapter #0 Enclosure Device ID: 252 Slot Number: 0 Drive's postion: DiskGroup: 0, Span: 0, Arm: 0 Enclosure position: N/A Device Id: 11 WWN: 5000C5003A25E5B0 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.875 GB [0x22dc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: FS64 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c5003a25e5b1 SAS Address(1): 0x0 Connected Port Number: 3(path0) Inquiry Data: SEAGATE ST9300603SS FS646SE3T05T FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :28C (82.40 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 252 Slot Number: 1 Drive's postion: DiskGroup: 0, Span: 0, Arm: 1 Enclosure position: N/A Device Id: 10 WWN: 5000C5003A438B50 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.875 GB [0x22dc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: FS64 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c5003a438b51 SAS Address(1): 0x0 Connected Port Number: 2(path0) Inquiry Data: SEAGATE ST9300603SS FS646SE3VC6A FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :26C (78.80 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 252 Slot Number: 2 Drive's postion: DiskGroup: 0, Span: 1, Arm: 0 Enclosure position: N/A Device Id: 9 WWN: 5000C5003A433BEC Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.875 GB [0x22dc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: FS64 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c5003a433bed SAS Address(1): 0x0 Connected Port Number: 1(path0) Inquiry Data: SEAGATE ST9300603SS FS646SE3VD8G FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :25C (77.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Enclosure Device ID: 252 Slot Number: 3 Drive's postion: DiskGroup: 0, Span: 1, Arm: 1 Enclosure position: N/A Device Id: 8 WWN: 5000C5003A2D1BDC Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 279.396 GB [0x22ecb25c Sectors] Non Coerced Size: 278.896 GB [0x22dcb25c Sectors] Coerced Size: 278.875 GB [0x22dc0000 Sectors] Firmware state: Online, Spun Up Device Firmware Level: FS64 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5000c5003a2d1bdd SAS Address(1): 0x0 Connected Port Number: 0(path0) Inquiry Data: SEAGATE ST9300603SS FS646SE3TTYY FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive Temperature :25C (77.00 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Port-1 : Port status: Active Port's Linkspeed: Unknown Drive has flagged a S.M.A.R.T alert : No Exit Code: 0x00 [root@lvs2b1c-93cb linuxTool]# MegaCli64 -LDInfo -Lall -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 557.75 GB Mirror Data : 557.75 GB State : Optimal Strip Size : 64 KB Number Of Drives per span:2 Span Depth : 2 Default Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, Write Cache OK if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None Default Power Savings Policy: Controller Defined Current Power Savings Policy: None Can spin up in 1 minute: Yes LD has drives that support T10 power conditions: Yes LD's IO profile supports MAX power savings with cached writes: No Is VD Cached: No Exit Code: 0x00 [root@lvs2b1c-93cb linuxTool]# MegaCli64 -AdpAllInfo -aALL Adapter #0 ============================================================================== Versions ================ Product Name : LSI MegaRAID SAS 9260-8i Serial No : SV11811701 FW Package Build: 12.12.0-0048 Mfg. Data ================ Mfg. Date : 04/26/11 Rework Date : 00/00/00 Revision No : 60A Battery FRU : N/A Image Versions in Flash: ================ FW Version : 2.120.63-1242 BIOS Version : 3.22.00_4.11.05.00_0x05020000 Preboot CLI Version: 04.04-017:#%00008 WebBIOS Version : 6.0-34-e_29-Rel NVDATA Version : 2.09.03-0013 Boot Block Version : 2.02.00.00-0000 BOOT Version : 09.250.01.219 Pending Images in Flash ================ None PCI Info ================ Controller Id : 0000 Vendor Id : 1000 Device Id : 0079 SubVendorId : 1000 SubDeviceId : 9261 Host Interface : PCIE ChipRevision : B4 Number of Frontend Port: 0 Device Interface : PCIE Number of Backend Port: 8 Port : Address 0 5000c5003a2d1bdd 1 5000c5003a433bed 2 5000c5003a438b51 3 5000c5003a25e5b1 4 0000000000000000 5 0000000000000000 6 0000000000000000 7 0000000000000000 HW Configuration ================ SAS Address : 500605b0033062e0 BBU : Absent Alarm : Present NVRAM : Present Serial Debugger : Present Memory : Present Flash : Present Memory Size : 512MB TPM : Absent On board Expander: Absent Upgrade Key : Absent Temperature sensor for ROC : Absent Temperature sensor for controller : Absent Settings ================ Current Time : 19:53:33 1/2, 2014 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 4 Delay Among Spinup Groups : 2s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled Battery Warning : Disabled Ecc Bucket Size : 15 Ecc Bucket Leak Rate : 1440 Minutes Restore HotSpare on Insertion : Disabled Expose Enclosure Devices : Enabled Maintain PD Fail History : Enabled Host Request Reordering : Enabled Auto Detect BackPlane Enabled : SGPIO/i2c SEP Load Balance Mode : Auto Use FDE Only : No Security Key Assigned : No Security Key Failed : No Security Key Not Backedup : No Default LD PowerSave Policy : Controller Defined Maximum number of direct attached drives to spin up in 1 min : 120 Auto Enhanced Import : No Any Offline VD Cache Preserved : No Allow Boot with Preserved Cache : No Disable Online Controller Reset : No PFK in NVRAM : No Use disk activity for locate : No POST delay : 90 seconds Capabilities ================ RAID Level Supported : RAID0, RAID1, RAID5, RAID6, RAID00, RAID10, RAID50, RAID60, PRL 11, PRL 11 with spanning, SRL 3 supported, PRL11-RLQ0 DDF layout with no span, PRL11-RLQ0 DDF layout with span Supported Drives : SAS, SATA Allowed Mixing: Mix in Enclosure Allowed Mix of SAS/SATA of HDD type in VD Allowed Status ================ ECC Bucket Count : 0 Limitations ================ Max Arms Per VD : 32 Max Spans Per VD : 8 Max Arrays : 128 Max Number of VDs : 64 Max Parallel Commands : 1008 Max SGE Count : 80 Max Data Transfer Size : 8192 sectors Max Strips PerIO : 42 Max LD per array : 16 Min Strip Size : 8 KB Max Strip Size : 1.0 MB Max Configurable CacheCade Size: 0 GB Current Size of CacheCade : 0 GB Current Size of FW Cache : 394 MB Device Present ================ Virtual Drives : 1 Degraded : 0 Offline : 0 Physical Devices : 5 Disks : 4 Critical Disks : 0 Failed Disks : 0 Supported Adapter Operations ================ Rebuild Rate : Yes CC Rate : Yes BGI Rate : Yes Reconstruct Rate : Yes Patrol Read Rate : Yes Alarm Control : Yes Cluster Support : No BBU : Yes Spanning : Yes Dedicated Hot Spare : Yes Revertible Hot Spares : Yes Foreign Config Import : Yes Self Diagnostic : Yes Allow Mixed Redundancy on Array : No Global Hot Spares : Yes Deny SCSI Passthrough : No Deny SMP Passthrough : No Deny STP Passthrough : No Support Security : No Snapshot Enabled : No Support the OCE without adding drives : Yes Support PFK : Yes Support PI : No Support Boot Time PFK Change : No Disable Online PFK Change : No PFK TrailTime Remaining : 0 days 0 hours Support Shield State : No Block SSD Write Disk Cache Change: No Supported VD Operations ================ Read Policy : Yes Write Policy : Yes IO Policy : Yes Access Policy : Yes Disk Cache Policy : Yes Reconstruction : Yes Deny Locate : No Deny CC : No Allow Ctrl Encryption: No Enable LDBBM : No Support Breakmirror : No Power Savings : Yes Supported PD Operations ================ Force Online : Yes Force Offline : Yes Force Rebuild : Yes Deny Force Failed : No Deny Force Good/Bad : No Deny Missing Replace : No Deny Clear : No Deny Locate : No Support Temperature : Yes Disable Copyback : No Enable JBOD : No Enable Copyback on SMART : No Enable Copyback to SSD on SMART Error : Yes Enable SSD Patrol Read : No PR Correct Unconfigured Areas : Yes Enable Spin Down of UnConfigured Drives : Yes Disable Spin Down of hot spares : No Spin Down time : 30 T10 Power State : Yes Error Counters ================ Memory Correctable Errors : 0 Memory Uncorrectable Errors : 0 Cluster Information ================ Cluster Permitted : No Cluster Active : No Default Settings ================ Phy Polarity : 0 Phy PolaritySplit : 0 Background Rate : 30 Strip Size : 64kB Flush Time : 4 seconds Write Policy : WB Read Policy : Adaptive Cache When BBU Bad : Disabled Cached IO : No SMART Mode : Mode 6 Alarm Disable : Yes Coercion Mode : None ZCR Config : Unknown Dirty LED Shows Drive Activity : No BIOS Continue on Error : No Spin Down Mode : None Allowed Device Type : SAS/SATA Mix Allow Mix in Enclosure : Yes Allow HDD SAS/SATA Mix in VD : Yes Allow SSD SAS/SATA Mix in VD : No Allow HDD/SSD Mix in VD : No Allow SATA in Cluster : No Max Chained Enclosures : 16 Disable Ctrl-R : Yes Enable Web BIOS : Yes Direct PD Mapping : No BIOS Enumerate VDs : Yes Restore Hot Spare on Insertion : No Expose Enclosure Devices : Yes Maintain PD Fail History : Yes Disable Puncturing : No Zero Based Enclosure Enumeration : No PreBoot CLI Enabled : Yes LED Show Drive Activity : Yes Cluster Disable : Yes SAS Disable : No Auto Detect BackPlane Enable : SGPIO/i2c SEP Use FDE Only : No Enable Led Header : Yes Delay during POST : 0 EnableCrashDump : No Disable Online Controller Reset : No EnableLDBBM : No Un-Certified Hard Disk Drives : Allow Treat Single span R1E as R10 : No Max LD per array : 16 Power Saving option : Don't Auto spin down Configured Drives Max power savings option is not allowed for LDs. Only T10 power conditions are to be used. Default spin down time in minutes: 30 Enable JBOD : No TTY Log In Flash : No Auto Enhanced Import : No BreakMirror RAID Support : No Disable Join Mirror : No Enable Shield State : No Time taken to detect CME : 60s Exit Code: 0x00 At 2014-01-02 23:40:24,"David Nellans" <david@xxxxxxxxxxx> wrote: > >> Problem summary: >> The IOPS is very unstable since I changed the number of jobs from 2 to 4. even I changed it back, the IOPS performance also can't return back. >> # cat 1.fio >> [global] >> rw=randread >> size=128m >> >> [job1] >> >> [job2] >> >> when I run fio 1.fio, the iops is around 31k. and then I add the following 2 entries: >> [job3] >> >> [job4] >> >> The IOPS dropped to around 1k. >> >> Even I remove these 2 jobs, the IOPS still be around 1k. >> >> Only if I removed all the jobn.n.0 files, and re-run with 2 jobs setting, the IOPS can be 31k again. > >> # bash blkinfo.sh /dev/sda >> Vendor : LSI >> Model : MR9260-8i >> Nr_request : 128 >> rotational : 1 > >It looks like you're testing against a LSI megaraid SAS controller, >which presumably has magnetic drives attached. When you add more jobs >to your config its going to cause the heads on the drives (you don't say >how many you have) to thrash more as they try and interleave requests >that are going to land on different portions of the disk. So its not >unsurprising that you'll see IOPS drop off. > >A lot of how and where the IOPS will drop off is going to depend on the >raid config of the drives you have attached to the controller however. >Generally speaking 31k IOPS at 128MB I/O's (which will be split into >something smaller like 1MB typically) is well beyond what you should >expect 8 HDD's to do unless you're getting lots of hits in the DRAM >buffer on the raid controller. Enterprise HDD's (even 15k ones) >generally can only sustain <= 250 random read IOPS, so even with perfect >interleaving on an 8 drive raid-0, 31k seem suspicious, 1k seems >perfectly realistic however!��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�