On Wed, Aug 03, 2005 at 12:37:44PM +0200, Lars Marowsky-Bree wrote: > On 2005-08-03T11:56:18, David Teigland <teigland@xxxxxxxxxx> wrote: > > > > * Why use your own journalling layer and not say ... jbd ? > > Here's an analysis of three approaches to cluster-fs journaling and their > > pros/cons (including using jbd): http://tinyurl.com/7sbqq > > Very instructive read, thanks for the link. While it may be true that for a full log, flushing for a *single* lock may be more expensive in OCFS2, Ken ignores the fact that in our one big flush we've made all locks on journalled resources immediately releasable. According to that description, GFS2 would have to do a seperate transaction flush (including the extra step of writing revoke records) for each lock protecting a journalled resource. Assuming the same number of locks are required to be dropped under both systems then for a number of locks > 1 OCFS2 will actually do less work - the actual metadata blocks would be the same on either end, but JBD only has to write that the journal is now clean to the journal superblock whereas GFS2 has to revoke the blocks for each dropped lock. Of course all of this talk completely avoids the fact that in any case these things are expensive so a cluster file system has to take care to ping locks as little as possible. OCFS2 takes great pains to make as many operations node local (requiring no cluster locks) as possible - data allocation is usually done from a node local pool which is refreshed from the main bitmap. Deallocation happens similarly - we have a truncate log in which we record deleted clusters. Each node has their own inode and metadata chain allocators which another node will only lock for delete (a truncate log style local metadata delete log could easily be added if that ever became a problem). --Mark -- Mark Fasheh Senior Software Developer, Oracle mark.fasheh@xxxxxxxxxx -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster