On Jun 30, 2009 16:27 -0700, Shaozhi Ye wrote: > We are planing to evaluate the reliability and integrity of ext4 > against power failures and will post the results when its done. > Please find attached design document and let me know if you have any > suggestions for features to test or existing benchmark tools which > serve our purpose. What might be interesting is to enhance fsx to work on multiple nodes. It would need an additional "sync" operation that would flush the cache to disk, so that there is a limited amount of rollback after a server reset. The client would log locally the file operations that are being done on the server and after the server is restarted the client would verify that the data in the file is consistent at least up to the most recent sync. Another very interesting filesystem test is "racer.sh" (a cleaned up version is in the Lustre test set), which does random filesystem operations (create, write, rename, link, unlink, symlink, mkdir, rmdir). Currently the operations are completely random, but if there was a client logging the operations it should be possible to track the state on the client. What would be necessary is for the server to export the current transaction id (tid) and then the client records this with every operation. At server recovery time the last committed tid is in the journal superblock and the client can then verify that its record of the filesystem state matches the actual state (after journal rollback). Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html