Hello! I suggest developers to consider ext4 design from the point of view of these papers: IRON FILE SYSTEMS - http://www.cs.wisc.edu/wind/Publications/vijayan-thesis06.pdf IMHO - very impressive paper and developers of close future filesystems can't ignore these problems and solutions. and "Failure Analysis of SGI XFS File System" http://www.cs.wisc.edu/~vshree/xfs.pdf >From IRON FILE SYSTEMS: "Disk drives are widely used as a primary medium for storing information. While commodity file systems trust disks to either work or fail completely, modern disks exhibit complex failure modes such as latent sector faults and block corruptions, where only portions of a disk fail. ... First, we design new low-level redundancy techniques that a file system can use to handle disk faults. We begin by qualitatively and quantitatively evaluating various redundancy information such as checksum, parity, and replica, Finally, we describe two update strategies: a overwrite and no-overwrite approach that a file system can use to update its data and parity blocks atomically without NVRAM support. Over all, we show that low-level redundant information can greatly enhance file system robustness while incurring modest time and space overheads. Second, to remedy the problem of failure handling diffusion, we develop amodified ext3 that unifies all failure handling in a Centralized Failure Handler (CFH). We then showcase the power of centralized failure handling in ext3c, a modified IRON version of ext3 that uses CFH by demonstrating its support for flexible, consistent, and fine-grained policies. By carefully separating policy from mechanism, ext3c demonstrates how a file system can provide a thorough, comprehensive, and easily understandable failure-handling policy. ... The importance of building dependable systems cannot be overstated. One of the fundamental requirements in computer systems is to store and retrieve information reliably. ... The fault model presented by modern disk drives, however, is much more complex. For example, modern drives can exhibit latent sector faults [14, 28, 45, 60, 100], where a block or set of blocks are inaccessible. Under latent sector fault, the sector fault occurs sometime in the past but the fault is detected only when the sector is accessed for storing or retrieving information [59]. Blocks sometimes become corrupted [16] and worse, this can happen silently without the disk being able to detect it [47, 74, 126]. Finally, disks sometimes exhibit transient performance problems [11, 115]. There are several reasons for these complex disk failure modes. First, a trend that is common in the drive industry is to pack more bits per square inch (BPS) as the areal densities of disk drives are growing at a rapid rate [48]. ... In addition, increased density can also increase the complexity of the logic, that is the firmware that manages the data [7], which can result in increased number of bugs. For example, buggy firmwares are known to issue misdirected writes [126], where correct data is placed on disk but in the wrong location. Second, increased use of low-end desktop drives such as the IDE/ATA drives worsens the reliability problem. Low cost dominates the design of personal storage drives [7] and therefore, they are less tested and have less machinery to handle disk errors [56]. Finally, amount of software used on the storage stack has increased. Firmware on a desktop drive contains about 400 thousand lines of code [33]. Moreover, the storage stack consists of several layers of low-level device driver code that have been considered to have more bugs than the rest of the operating system code [38, 113]. As Jim Gray points out in his study of Tandem Availability, “As the other components of the system become increasingly reliable, software necessarily becomes the dominant cause of outages” [44]. ... Our study focuses on four important and substantially different open-source file systems, ext3 [121], ReiserFS [89], IBM’s JFS [19], and XFS [112] and one closedsource file system, Windows NTFS [109]. From our analysis results, we find that the technology used by high-end systems (e.g., checksumming, disk scrubbing, and so on) has not filtered down to the realm of commodity file systems. Across all platforms, we find ad hoc failure handling and a great deal of illogical inconsistency in failure policy, often due to the diffusion of failure handling code through the kernel; such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies. Moreover, failure handling diffusion makes it difficult to examine any one or few portions of the code and determine how failure handling is supposed to behave. Diffusion also implies that failure handling is inflexible; policies that are spread across so many locations within the code base are hard to change. In addition, we observe that failure handling is quite coarse-grained; it is challenging to implement nuanced policies in the current system. We also discover that most systems implement portions of their failure policy incorrectly; the presence of bugs in the implementations demonstrates the difficulty and complexity of correctly handling certain classes of disk failure. ... We show that none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy. ... We found a number of bugs and inconsistencies in the ext3 failure policy. First,errors are not always propagated to the user (e.g., truncate and rmdir fail silently). Second, ext3 does not always perform sanity checking; for example, unlink does not check the linkscount field before modifying it and therefore a corrupted value can lead to a system crash. Third, although ext3 has redundant copies of the superblock (RRedundancy), these copies are never updated after file system creation and hence are not useful. Finally, there are important cases when ext3 violates the journaling semantics, committing or checkpointing invalid transactions." Thanks for attention! - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html