On Sat, Dec 31, 2011 at 2:13 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > On Fri, Dec 30, 2011 at 08:37:00AM +0530, Amit Sahrawat wrote: >> On Fri, Dec 30, 2011 at 2:27 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: >> > On Thu, Dec 29, 2011 at 01:10:49PM +0000, amit.sahrawat83@xxxxxxxxx wrote: >> >> Hi, I am using a test setup which is doing write using multiple >> >> threads using direct IO. The buffer size which is used to write is >> >> 512KB. After continously running this for long duration - i >> >> observe that number of extents in each file is getting >> >> huge(2K..4K..). I observed that each extent is of 512KB(aligned to >> >> write buffer size). I wish to have low number of extents(i.e, >> >> reduce fragmentation)... In case of buffered IO- preallocation >> >> works good alongwith the mount option 'allocsize'. Is there >> >> anything which can be done for Direct IO? Please advice for >> >> reducing fragmentation with direct IO. >> > >> > Direct IO does not do any implicit preallocation. The filesystem >> > simply gets out of the way of direct IO as it is assumed you know >> > what you are doing. >> This is the supporting line I was looking for. >> > >> > i.e. you know how to use the fallocate() or ioctl(XFS_IOC_RESVSP64) >> > calls to preallocate space or to set up extent size hints to use >> > larger allocations than the IO being done during syscalls... >> I tried to make use of preallocating space using >> ioctl(XFS_IOC_RESVSP64) - but over time - this is also not working >> well with the Direct I/O. > > Without knowing how you are using preallocation, I cannot comment on > this. Can you describe how your application does IO (size, > frequency, location in file, etc) and preallocation (same again), as > well as xfs_bmap -vp <file> output of fragmented files? That way I > have some idea of what your problem is and so might be able to > suggest fixes... Prealloction was done using - snippets like these: fl.l_whence = SEEK_SET; fl.l_start = 0; fl.l_len = (long long) PREALLOC; /* 1GB */ printf ("Preallocating %lld MB\n", (fl.l_len / (1024 * 1024))); err = ioctl (hFile, XFS_IOC_RESVSP64, &fl); I verified the prealloc working by taking a look at the file size (ls -l) disk usage using 'df -kh' and also taking a look at the file extents using xfs_bmap xfs_bmap shows the extent of the preallocated length. i.e., preallocation was working as expected. To share the test case, due to some reasons - I cannot share the exact code - but the working is like this: In the Test case - there are 5 threads WRITE_SIZE - 512KB TRUNCSIZE - 250MB 1st Thread - this is doing actual amongst all the threads buffer = valloc(WRITE_SIZE); fd= open64(file,O_CREAT|O_DIRECT|O_WRONLY|O_TRUNC) Initial write to file data of 5GB using 512KB buffer size for(i=0; i < WRITE_COUNT; i++) { write(fd, buffer,WRITE_SIZE); } fsync(fd) while(1) { if(ncount++ < TRUNCSIZE) { write(fd,buffer,WRITE_SIZE); } else { close(fd) open(fd, O_RDWR|O_CREAT) gettimeofday() - Start Point sync(fd); // At times this sync is taking time around 5sec even though the test case is doing I/O using O_DIRECT gettimeofday() - End Point If(sync time greater than 2secs) exit(0); gettimeofday() - Start Point ftruncate(fd,TRUNCSIZE); gettimeofday() - End Point if(truncate time greater than 2sec) exit(0); fsync(fd) close(fd); open64(file, O_WRONLY|O_APPEND|O_DIRECT); ncount = 0; } fsync(fd); } 2nd Thread - Writing to a file in while loop while (1) { write(10 bytes) fsync(); usleep(100 * 1000); } 3rd Thread - Reading the file from 2nd Thread while(1){ read(file, buffer,10); lseek(file, 0,0); usleep(10000); } 4th thread - Just printing the the size information for the '2' files which are written 5th thread - Also, reading the file from 2nd thread > >> Is there any call to set up extent size >> also? please update I can try to make use of that also. > > `man xfsctl` and search for XFS_IOC_FSSETXATTR. thanks Dave, this is exactly what was needed - this is working as of now. But there continues to be a problem with the sync time. Even though there is no dirty data - but still sync is taking time around 5sec(but this is very rare - and observed very few times in overnight runnings) So, also very difficult to debug what could be the issue and who could be culprit. At one time - tried to check the trace during this sync time issue - please find as given below: (dump_backtrace+0x0/0x11c) from [<c0389520>] (dump_stack+0x20/0x24) (dump_stack+0x0/0x24) from [<c0067b70>] (__schedule_bug+0x7c/0x8c) (__schedule_bug+0x0/0x8c) from [<c0389bc0>] (schedule+0x88/0x5fc) (schedule+0x0/0x5fc) from [<c020a0c8>] (_xfs_log_force+0x238/0x28c) (_xfs_log_force+0x0/0x28c) from [<c020a320>] (xfs_log_force+0x20/0x40) (xfs_log_force+0x0/0x40) from [<c02308c4>] (xfs_commit_dummy_trans+0xc8/0xd4) (xfs_commit_dummy_trans+0x0/0xd4) from [<c0231468>] (xfs_quiesce_data+0x60/0x88) (xfs_quiesce_data+0x0/0x88) from [<c022e080>] (xfs_fs_sync_fs+0x2c/0xe8) (xfs_fs_sync_fs+0x0/0xe8) from [<c015cccc>] (__sync_filesystem+0x8c/0xa8) (__sync_filesystem+0x0/0xa8) from [<c015cd1c>] (sync_one_sb+0x34/0x38) (sync_one_sb+0x0/0x38) from [<c013b1f0>] (iterate_supers+0x7c/0xc0) (iterate_supers+0x0/0xc0) from [<c015cbf4>] (sync_filesystems+0x28/0x34) (sync_filesystems+0x0/0x34) from [<c015cd68>] (sys_sync+0x48/0x78) (sys_sync+0x0/0x78) from [<c003b4c0>] (ret_fast_syscall+0x0/0x48) In order to resolve this - applied the below patche: xfs: dummy transactions should not dirty VFS state http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=1a387d3be2b30c90f20d49a3497a8fc0693a9d18 But still continued to observe the sync timing issue. One thing, do we need fsync() - when performing write using O_DIRECT?I think 'no' Also, should sync() be taking time when there is no 'dirty' data? Please share your opinion. Thanks & Regards, Amit Sahrawat > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs