On Mon, Nov 12, 2012 at 1:50 AM, Andreas Dilger <adilger@xxxxxxxxx> wrote: > On 2012-11-11, at 4:37, Roger Niva <rogerniva@xxxxxxxxx> wrote: >> >> We are trying to pin down a file corruption issue we have on 5 >> productionservers and would like some suggestions about how to proceed >> to find the culprit. It may or may not be ext4-related, but as that is >> the only clue we have so far, we're trying here first. >> >> The productionservers are running Slackware 13.37 with a selfcompiled >> kernel (no patches or external modules). >> We have a script running daily that copies files from one folder to >> another using cp. > > there was a bug in ext4 FIEMAP ioctl code in the past that interacted badly with fileutils for copying files that were just written and still in cache. That was around 2.6.26 or so. > It is commit 6d9c85eb700bd3ac59e63bb9de463dea1aca084c that went in at v2.6.39. However, looking at ext4_fiemap(), it does seem racy. If pages are written back between ext4_ext_find_extent() and ext4_ext_fiemap_cb(), fiemap will report holes. This can possibly happen when cp runs concurrently with background flusher, which is common for a long running production server. If this is true, the bug also exists in latest upstream. -- Thanks, Tao -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html