Re: concurrent direct IO write in xfs

Zheng Da <zhengda1936@xxxxxxxxx> · Mon, 23 Jan 2012 14:34:34 -0500

Hello,

On Mon, Jan 23, 2012 at 12:11 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

> >

> This is weird. Yes, I'm sure. I use pwrite() to write data to a 4G file,

> and I check the offset of each write and they are always smaller than 4G.

> I instrument the code with systemtap and it shows me that ip->i_new_size

> and new_size in xfs_aio_write_newsize_update are both 0.

> Since in my case there is only overwrite, ip->i_new_size will always be 0

> (the only place that updates ip->i_new_size is xfs_file_aio_write_checks).

> Because of the same reason, new_size returned by xfs_file_aio_write_checks

> is always 0.

> Is it what you expected?

No idea. I don't know what the problem you are seeing is yet, or if

indeed there even is a problem as I don't really understand what you

are trying to do or what results you are expecting to see...
Here I was just wondering if i_new_size is always 0 if there are only overwrites. I think it has nothing to do with the pattern of my workloads or the device I used for the test.

Indeed, have you run the test on something other than a RAM disk and

confirmed that the problem exists on a block device that has real IO

latency? If your IO takes close to zero time, then there isn't any

IO level concurrency you can extract from single file direct IO; it

will all just serialise on the extent tree lookups.
It's difficult to test the scalability problem in the traditional disks. They provide very low IOPS (IO per second). Even two SSDs can't provide enough IOPS. 
I don't think all direct IO will serialized on the extent tree lookups. Direct IO reads can parallelized pretty well and they also need extent tree lookups.

> > >  0xffffffff812829f4 : __xfs_get_blocks+0x94/0x4a0 [kernel]

> >

> > And for direct IO writes, this will be the block mapping lookup so

> > always hit.

> >

> >

> > What this says to me is that you are probably doing is lots of very

> > small concurrent write IOs, but I'm only guessing.  Can you provide

> > your test case and a description of your test hardware so we can try

> > to reproduce the problem?

> >

> I build XFS on the top of ramdisk. So yes, there is a lot of small

> concurrent writes in a second.

> I create a file of 4GB in XFS (the ramdisk has 5GB of space). My test

> program overwrites 4G of data to the file and each time writes a page of

> data randomly to the file. It's always overwriting, and no appending. The

> offset of each write is always aligned to the page size. There is no

> overlapping between writes.

Why are you using XFS for this? tmpfs was designed to do this sort

of stuff as efficiently as possible....
OK, I can try that. 

> So the test case is pretty simple and I think it's easy to reproduce it.

> It'll be great if you can try the test case.

Can you post your test code so I know what I test is exactly what

you are running?
I can do that. My test code gets very complicated now. I need to simplify it.

Thanks,
Da

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs