On 06/29/2011 06:08 AM, Theodore Tso wrote:
On Jun 28, 2011, at 4:20 PM, Mccauliff, Sean D. (ARC-PX)[Lockheed Martin Space OPNS] wrote:
Last time I benchmarked cp and tar with the respective sparse file options they where extremely slow as they (claim) to identify sparseness by contiguous regions of zeros. This was quite sometime ago so perhaps cp and tar have changed.
How many of your files are sparse? If the source file is not sparse (which you can check by looking at st_blocks and comparing it to the st_size value), and skip calling fiemap in that case.
I already know most of the files are not sparse and can identify them by
the directory name they reside in. So the copy program just does a
straight copy for the non-sparse files without the fiemap trickery.
I've already mentioned that I have about 2M sparse files.
Also, how many times are you calling fiemap per file? Are you calling once per block, or something silly like this?
Twice. Once to get the number of struct fiemap_extent and another time
with the correct number of struct fiemap_extent.
(This is all of the details that should have been in your initial question, by the way.... we're not mind readers, you know. Can you just send a copy of the key parts of your Java code?)
Sorry, I didn't mean to bother you. I did try and email ext3-users so
as to not take up any developer time with my question. Portions of the
source are below. It might also be useful to know the source and
destination file systems live on a 3par SAN, RAID 1+0 stripped across
240 7200 rpm disks. The source file system uses LVM to combine several
3par volumes into a single volume. The destination file system does not
use LVM. There are two FC HBAs, they are load balanced using
multipathd. My original question:
> I'm copying terabytes of data from an ext3 file system to a new ext4
> file system. I'm seeing high CPU usage from the processes flush-
>253:2, kworker-3:0, kworker-2:2, kworker-1:1, and kworker-0:0. Does
> anyone on the list have any idea what these processes do, why they
>are consuming so much cpu time and if there is something that can be
>done about it? This is using Fedora 15.
Thanks,
Sean
///This is a snipped from extentmap.cpp, I thought I would spare you
//the madness of looking the JNI portion.
static void initFiemap(struct fiemap* fiemap, __u32 nExtents) {
if (fiemap == 0) {
throw FiemapException("Bad fiemap pointer.");
}
memset(fiemap, 0, sizeof(struct fiemap));
//Start mapping the file from user space length 0.
fiemap->fm_start = 0;
//Start mapping to the last possible byte of user space.
fiemap->fm_length = ~0ULL;
//In the current code this is now FIEMAP_FLAG_SYNC
fiemap->fm_flags = 0;
fiemap->fm_extent_count = nExtents;
fiemap->fm_mapped_extents = 0;
memset(fiemap->fm_extents, 0, sizeof(struct fiemap_extent) * nExtents);
}
static struct fiemap *readFiemap(int fd) throw (FiemapException) {
struct fiemap* extentMap =
reinterpret_cast<struct fiemap*>(malloc(sizeof(struct fiemap)));
if (extentMap == 0) {
throw FiemapException("Failed to allocate fiemap struct.");
}
FiemapDeallocator fiemapDeallocator(extentMap);
initFiemap(extentMap, 0);
// Find out how many extents there are
if (ioctl(fd, FS_IOC_FIEMAP, extentMap) < 0) {
char errbuf[128];
strerror_r(errno, errbuf, 127);
throw FiemapException(errbuf);
}
__u32 nExtents = extentMap->fm_mapped_extents;
__u32 extents_size = sizeof(struct fiemap_extent) * nExtents;
fiemapDeallocator.noDeallocate();
// Resize fiemap to allow us to read in the extents.
extentMap = reinterpret_cast<struct
fiemap*>(realloc(extentMap,sizeof(struct fiemap) + extents_size));
if (extentMap == 0) {
throw FiemapException("Out of memory allocating fiemap.");
}
initFiemap(extentMap, nExtents);
FiemapDeallocator reallocDeallocator(extentMap);
if (ioctl(fd, FS_IOC_FIEMAP, extentMap) < 0) {
char errbuf[128];
strerror_r(errno, errbuf, 127);
throw FiemapException(errbuf);
}
reallocDeallocator.noDeallocate();
return extentMap;
}
////This is from the Java code SparseFileUtil.java
public List<SimpleInterval> extents(File file) throws IOException {
//A SimpleInterval is just a 64bit start and end pair
SimpleInterval[] extents = null;
try {
extents = extentsForFile(file.getAbsolutePath());
} catch (IllegalArgumentException iae) {
throw new IllegalArgumentException("For file \"" + file +
"\".", iae);
}
if (extents.length == 0) {
return Collections.emptyList();
}
Arrays.sort(extents, comp);
List<SimpleInterval> mergedExtents = new ArrayList<SimpleInterval>();
SimpleInterval current = extents[0];
//merge adjacent extents
for (int i=1; i < extents.length; i++) {
SimpleInterval sortedExtent = extents[i];
if (current.end() < sortedExtent.start()) {
mergedExtents.add(current);
current = sortedExtent;
} else {
current = new SimpleInterval(Math.min(sortedExtent.start(),
current.start()),
Math.max(current.end(), sortedExtent.end()));
}
}
mergedExtents.add(current);
return mergedExtents;
}
public void copySparseFile(File src, File dest) throws IOException {
if (!src.exists()) {
throw new FileNotFoundException(src.getAbsolutePath());
}
if (src.isDirectory()) {
throw new IllegalArgumentException("Src must be a file.");
}
List<SimpleInterval> extents = extents(src);
if (extents.size() == 1 && extents.get(0).start() == 0) {
FileUtils.copyFile(src, dest);
return;
}
byte[] buf = new byte[1024*1024];
RandomAccessFile srcRaf = new RandomAccessFile(src, "r");
try {
RandomAccessFile destRaf = new RandomAccessFile(dest, "rw");
try {
for (SimpleInterval extent : extents) {
long extentSize = extent.end() - extent.start() + 1;
srcRaf.seek(extent.start());
destRaf.seek(extent.start());
while (extentSize > 0) {
int readLen = (int) Math.min(buf.length, extentSize);
int nread = srcRaf.read(buf,0, readLen);
if (nread == -1) {
break; //file ends before extent ends.
}
extentSize -= nread;
destRaf.write(buf, 0, nread);
}
}
} finally {
FileUtil.close(destRaf);
}
} finally {
FileUtil.close(srcRaf);
}
}
private native SimpleInterval[] extentsForFile(String fname) throws
IOException;
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html