There is no way that fallocate() of 1000x1k files should be causing
450MB/s of IO for 5 seconds.
I agree and that's what has me puzzled as well.
However, I still have no idea what you are running this test on - as
I asked in another email, can you provide some information about
the system your are seeing this problem on so we can try to work out
what might be causing this?
sorry about that. This is an HP box with 192GB RAM and 6 2-core hyperthreaded CPUs, running ubuntu/precise
segerm@az1-sw-object-0006:~$ uname -a
Linux az1-sw-object-0006 2.6.38-16-server #68hf1026116v20120926-Ubuntu SMP Wed Sep 26 14:34:13 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
segerm@az1-sw-object-0006:~$ python --version
Python 2.7.1+
segerm@az1-sw-object-0006:~$ xfs_repair -V
xfs_repair version 3.1.4
segerm@az1-sw-object-0006:~$ cat /proc/meminfo
MemTotal: 198191696 kB
MemFree: 166202324 kB
Buffers: 193268 kB
Cached: 21595332 kB
SwapCached: 36 kB
Active: 7992268 kB
Inactive: 14068428 kB
Active(anon): 719408 kB
Inactive(anon): 5244092 kB
Active(file): 7272860 kB
Inactive(file): 8824336 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 126889980 kB
SwapFree: 126863972 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 232156 kB
Mapped: 11384 kB
Shmem: 5731388 kB
Slab: 5755820 kB
SReclaimable: 4792292 kB
SUnreclaim: 963528 kB
KernelStack: 6240 kB
PageTables: 7328 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 225985828 kB
Committed_AS: 8851976 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 725112 kB
VmallocChunk: 34159749436 kB
HardwareCorrupted: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 118648 kB
DirectMap2M: 5089280 kB
DirectMap1G: 196083712 kB
over 60 mounts, but here's the one I'm writing to:
segerm@az1-sw-object-0006:~$ mount | grep disk0
/dev/sdc1 on /srv/node/disk0 type xfs (rw,nobarrier)
not sure what you're looking for here so here's it all
segerm@az1-sw-object-0006:~$ cat /proc/partitions
major minor #blocks name
8 0 976762584 sda
8 1 248976 sda1
8 2 1 sda2
8 5 976510993 sda5
251 0 41943040 dm-0
251 1 8785920 dm-1
251 2 2928640 dm-2
8 16 976762584 sdb
8 17 976760832 sdb1
251 3 126889984 dm-3
251 4 389120 dm-4
251 5 41943040 dm-5
8 32 2930233816 sdc
8 33 2930233344 sdc1
8 48 2930233816 sdd
8 49 2930233344 sdd1
8 64 2930233816 sde
8 65 2930233344 sde1
8 80 2930233816 sdf
8 81 2930233344 sdf1
8 96 2930233816 sdg
8 97 2930233344 sdg1
8 112 2930233816 sdh
8 113 2930233344 sdh1
8 128 2930233816 sdi
8 129 2930233344 sdi1
8 144 2930233816 sdj
8 145 2930233344 sdj1
8 160 2930233816 sdk
8 161 2930233344 sdk1
8 176 2930233816 sdl
8 177 2930233344 sdl1
8 192 2930233816 sdm
8 193 2930233344 sdm1
8 208 2930233816 sdn
8 209 2930233344 sdn1
8 240 2930233816 sdp
8 241 2930233344 sdp1
8 224 2930233816 sdo
8 225 2930233344 sdo1
65 0 2930233816 sdq
65 1 2930233344 sdq1
65 16 2930233816 sdr
65 17 2930233344 sdr1
65 32 2930233816 sds
65 33 2930233344 sds1
65 176 2930233816 sdab
65 177 2930233344 sdab1
66 0 2930233816 sdag
66 1 2930233344 sdag1
65 64 2930233816 sdu
65 65 2930233344 sdu1
65 128 2930233816 sdy
65 129 2930233344 sdy1
65 224 2930233816 sdae
65 225 2930233344 sdae1
65 144 2930233816 sdz
65 145 2930233344 sdz1
65 208 2930233816 sdad
65 209 2930233344 sdad1
65 48 2930233816 sdt
65 49 2930233344 sdt1
65 240 2930233816 sdaf
65 241 2930233344 sdaf1
65 96 2930233816 sdw
65 97 2930233344 sdw1
66 32 2930233816 sdai
66 33 2930233344 sdai1
66 48 2930233816 sdaj
66 49 2930233344 sdaj1
65 112 2930233816 sdx
65 113 2930233344 sdx1
65 160 2930233816 sdaa
65 161 2930233344 sdaa1
65 80 2930233816 sdv
65 81 2930233344 sdv1
65 192 2930233816 sdac
65 193 2930233344 sdac1
66 16 2930233816 sdah
66 17 2930233344 sdah1
66 64 2930233816 sdak
66 65 2930233344 sdak1
66 144 2930233816 sdap
66 145 2930233344 sdap1
66 224 2930233816 sdau
66 225 2930233344 sdau1
67 208 2930233816 sdbj
67 209 2930233344 sdbj1
66 112 2930233816 sdan
66 113 2930233344 sdan1
66 96 2930233816 sdam
66 97 2930233344 sdam1
66 80 2930233816 sdal
66 81 2930233344 sdal1
67 96 2930233816 sdbc
67 97 2930233344 sdbc1
66 128 2930233816 sdao
66 129 2930233344 sdao1
67 128 2930233816 sdbe
67 129 2930233344 sdbe1
66 176 2930233816 sdar
66 177 2930233344 sdar1
67 160 2930233816 sdbg
67 161 2930233344 sdbg1
66 160 2930233816 sdaq
66 161 2930233344 sdaq1
67 48 2930233816 sdaz
67 49 2930233344 sdaz1
66 208 2930233816 sdat
66 209 2930233344 sdat1
67 112 2930233816 sdbd
67 113 2930233344 sdbd1
67 144 2930233816 sdbf
67 145 2930233344 sdbf1
67 64 2930233816 sdba
67 65 2930233344 sdba1
67 16 2930233816 sdax
67 17 2930233344 sdax1
67 32 2930233816 sday
67 33 2930233344 sday1
66 192 2930233816 sdas
66 193 2930233344 sdas1
67 192 2930233816 sdbi
67 193 2930233344 sdbi1
66 240 2930233816 sdav
66 241 2930233344 sdav1
67 80 2930233816 sdbb
67 81 2930233344 sdbb1
67 176 2930233816 sdbh
67 177 2930233344 sdbh1
67 0 2930233816 sdaw
67 1 2930233344 sdaw1
segerm@az1-sw-object-0006:~$ xfs_info /srv/node/disk0
meta-data="" isize=1024 agcount=32, agsize=22892416 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=732557312, imaxpct=5
= sunit=64 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=357696, version=2
= sectsz=512 sunit=64 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Is this enough to describe my environment?
http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
I don't know what stats you are getting your information from. XFS
> This is the same behavior I'm seeing on swift. 10K 1KB files X 4kb minimal
> block size still comes out to a lot less than the multiple GB of writes
> being reported. Actually since these whole thing only takes a few seconds
> and I know a single disk can't write that fast maybe it's just a bug in the
> way the kernel is reported allocated preallocated blocks and nothing to do
> with XFS? Or iis xfs responsible for the stats?
is only responsible for what is in /proc/fs/xfs/,,,
I'm getting my stats from 2 places, collectl reading /proc/diskstats AND xfs from /proc/fs/xfs though I've only shown you the collectl stats. It's easy enough to send you the /proc/fs/xfs stats if you want them as both formatted output as well as the raw /proc/fs/xfs contents every second from which they're derived.
> If I remove the fallocate call I see the expected amount of disk traffic.Have you run strace to confirm that your python script is actually
calling fallocate(), and it's not being emulated by python or libc?
I haven't. However when I saw this exact behavior with swift, we did see both fallocate being called and here's a small section showing a complete 10K PUT operation and the beginning of the next one:
0.067874 cpu=0 pid=41977 fallocate [285] entry fd=15 mode=0x1 offset=0x0 len=10240
0.067980 cpu=0 pid=41977 block_rq_insert dev_t=0x04100030 wr=write flags=SYNC sector=0xaec11a00 len=262144
0.067983 cpu=0 pid=41977 block_rq_issue dev_t=0x04100030 wr=write flags=SYNC sector=0xaec11a00 len=262144
0.068010 cpu=0 pid=41977 sched_switch prio=120 state=SSLEEP next_pid=0 next_prio=120
0.068125 cpu=0 pid=0 block_rq_complete dev_t=0x04100030 wr=write flags=SYNC sector=0xaec11a00 len=262144 qpid=41977 spid=41977 qtm= 0.000004 svtm= 0.000142
0.068175 cpu=0 pid=48168 sched_wakeup target_pid=41977 prio=120 target_cpu=12 success=1
0.068180 cpu=12 pid=0 sched_switch prio=n/a state=n/a next_pid=41977 next_prio=120
0.068185 cpu=12 pid=41977 fallocate [285] ret=0x0 syscallbeg= 0.000311 fd=15 mode=0x1 offset=0x0 len=10240
I hope this helps but if there's any more I can provide I'll be happy to do so.
-mark
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs