On Wed, 2005-08-03 at 11:58 +0000, "Sævaldur Arnar Gunnarsson [Hugsmiðjan]" wrote: > What does "automatic" fencing have to offer that the manual fencing > lacks. Automatic fencing uses hardware to fence a node and reboot it. Manual fencing relay on you to manually fence the node whenever you release there is a problem in the cluster and relays on you to prowercycle the faulty node manually, no very convenient when you are sysadmin the cluster remotely. > If we decide to buy the FC switch right away is it recomended that we > buy one of the ones that have fencing agent available for the > Cluster-Suite ? If you look at the configuration manual for RHCS, there is a list of supported fencing agents. > If can't get our hands on supported FC switchs can we do fencing in > another manner than throught a FC switch ? Manual fencing. Nando On 8/3/05, linux-cluster-request@xxxxxxxxxx <linux-cluster-request@xxxxxxxxxx> wrote: > Send submissions to > linux-cluster@xxxxxxxxxx > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request@xxxxxxxxxx > > You can reach the person managing the list at > linux-cluster-owner@xxxxxxxxxx > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Re: [PATCH 00/14] GFS (Lars Marowsky-Bree) > 2. Fencing agents (S?valdur Arnar Gunnarsson [Hugsmi?jan]) > 3. Re: [PATCH 00/14] GFS (Arjan van de Ven) > 4. Re: [PATCH 00/14] GFS (Arjan van de Ven) > 5. Re: [PATCH 00/14] GFS (Pekka Enberg) > 6. Re: [PATCH 00/14] GFS (Jan Engelhardt) > 7. Re: [PATCH 00/14] GFS (Arjan van de Ven) > 8. Re: [PATCH 00/14] GFS (Hans Reiser) > 9. Re: [PATCH 00/14] GFS (Jan Engelhardt) > 10. Re: [PATCH 00/14] GFS (Pekka Enberg) > 11. Re: [PATCH 00/14] GFS (Kyle Moffett) > 12. Re: [PATCH 00/14] GFS (Arjan van de Ven) > 13. Re: [PATCH 00/14] GFS (Arjan van de Ven) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 3 Aug 2005 12:37:44 +0200 > From: Lars Marowsky-Bree <lmb@xxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: David Teigland <teigland@xxxxxxxxxx>, Arjan van de Ven > <arjan@xxxxxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, > linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <20050803103744.GG11081@xxxxxxxxxxxxxxxx> > Content-Type: text/plain; charset=us-ascii > > On 2005-08-03T11:56:18, David Teigland <teigland@xxxxxxxxxx> wrote: > > > > * Why use your own journalling layer and not say ... jbd ? > > Here's an analysis of three approaches to cluster-fs journaling and their > > pros/cons (including using jbd): http://tinyurl.com/7sbqq > > Very instructive read, thanks for the link. > > > > -- > High Availability & Clustering > SUSE Labs, Research and Development > SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin > "Ignorance more frequently begets confidence than does knowledge" > > > > ------------------------------ > > Message: 2 > Date: Wed, 03 Aug 2005 11:58:47 +0000 > From: "S?valdur Arnar Gunnarsson [Hugsmi?jan]" <addi@xxxxxxxxxxxxx> > Subject: Fencing agents > To: linux-cluster@xxxxxxxxxx > Message-ID: <42F0B177.7050907@xxxxxxxxxxxxx> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > I'm implementing a shared storage between multiple (2 at the moment) > Blade machines (Dell PowerEdge 1855) running RHEL4 ES connected to a EMC > AX100 through FC. > > The SAN has two FC ports so the need for a FC Switch has not yet come > however we will add other Blades in the coming months. > The one thing I haven't got figured out with GFS and the Cluster-Suite > is the whole idea about fencing. > > We have a working setup using Centos rebuilds of the Cluster-Suite and > GFS (http://rpm.karan.org/el4/csgfs/) which we are not planning to use > in the final implementation where we plan to use the official GFS > packages from Red Hat. > The fencing agents in that setup is manual fencing. > > Both machines have the file system mounted and there appears to be no > problems. > > What does "automatic" fencing have to offer that the manual fencing lacks. > If we decide to buy the FC switch right away is it recomended that we > buy one of the ones that have fencing agent available for the > Cluster-Suite ? > > If can't get our hands on supported FC switchs can we do fencing in > another manner than throught a FC switch ? > > > > > -- > Sævaldur Gunnarsson :: Hugsmiðjan > > > > ------------------------------ > > Message: 3 > Date: Tue, 02 Aug 2005 09:45:24 +0200 > From: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: David Teigland <teigland@xxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, > linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <1122968724.3247.22.camel@xxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain > > On Tue, 2005-08-02 at 15:18 +0800, David Teigland wrote: > > Hi, GFS (Global File System) is a cluster file system that we'd like to > > see added to the kernel. The 14 patches total about 900K so I won't send > > them to the list unless that's requested. Comments and suggestions are > > welcome. Thanks > > > > http://redhat.com/~teigland/gfs2/20050801/gfs2-full.patch > > http://redhat.com/~teigland/gfs2/20050801/broken-out/ > > * The on disk structures are defined in terms of uint32_t and friends, > which are NOT endian neutral. Why are they not le32/be32 and thus > endian-defined? Did you run bitwise-sparse on GFS yet ? > > * None of your on disk structures are packet. Are you sure? > > * > +#define gfs2_16_to_cpu be16_to_cpu > +#define gfs2_32_to_cpu be32_to_cpu > +#define gfs2_64_to_cpu be64_to_cpu > > why this pointless abstracting? > > * +static const uint32_t crc_32_tab[] = ..... > > why do you duplicate this? The kernel has a perfectly good set of generic crc32 tables/functions just fine > > * Why are you using bufferheads extensively in a new filesystem? > > * + if (create) > + down_write(&ip->i_rw_mutex); > + else > + down_read(&ip->i_rw_mutex); > > why do you use a rwsem and not a regular semaphore? You are aware that rwsems are far more expensive than regular ones right? > How skewed is the read/write ratio? > > * Why use your own journalling layer and not say ... jbd ? > > * + while (!kthread_should_stop()) { > + gfs2_scand_internal(sdp); > + > + set_current_state(TASK_INTERRUPTIBLE); > + schedule_timeout(gfs2_tune_get(sdp, gt_scand_secs) * HZ); > + } > > you probably really want to check for signals if you do interruptible sleeps > (multiple places) > > * why not use msleep() and friends instead of schedule_timeout(), you're not using the complex variants anyway > > * +++ b/fs/gfs2/fixed_div64.h 2005-08-01 14:13:08.009808200 +0800 > > ehhhh why? > > * int gfs2_copy2user(struct buffer_head *bh, char **buf, unsigned int offset, > + unsigned int size) > +{ > + int error; > + > + if (bh) > + error = copy_to_user(*buf, bh->b_data + offset, size); > + else > + error = clear_user(*buf, size); > > that looks to be missing a few kmaps.. whats the guarantee that b_data is actually, like in lowmem? > > * [PATCH 08/14] GFS: diaper device > > The diaper device is a block device within gfs that gets transparently > inserted between the real device the and rest of the filesystem. > > hmmmm why not use device mapper or something? Is this really needed? Should it live in drivers/block ? Doesn't > this wrapper just increase the risk for memory deadlocks? > > * [PATCH 06/14] GFS: logging and recovery > > quoting the ren and stimpy show is nice.. but did the ren ans stimpy authors agree to license their stuff under the GPL? > > > * do_lock_wait > > that almost screams for using wait_event and related APIs > > > * > +static inline void gfs2_log_lock(struct gfs2_sbd *sdp) > +{ > + spin_lock(&sdp->sd_log_lock); > +} > why the abstraction ? > > > > > > ------------------------------ > > Message: 4 > Date: Tue, 02 Aug 2005 09:45:24 +0200 > From: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: David Teigland <teigland@xxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, > linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <1122968724.3247.22.camel@xxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain > > On Tue, 2005-08-02 at 15:18 +0800, David Teigland wrote: > > Hi, GFS (Global File System) is a cluster file system that we'd like to > > see added to the kernel. The 14 patches total about 900K so I won't send > > them to the list unless that's requested. Comments and suggestions are > > welcome. Thanks > > > > http://redhat.com/~teigland/gfs2/20050801/gfs2-full.patch > > http://redhat.com/~teigland/gfs2/20050801/broken-out/ > > * The on disk structures are defined in terms of uint32_t and friends, > which are NOT endian neutral. Why are they not le32/be32 and thus > endian-defined? Did you run bitwise-sparse on GFS yet ? > > * None of your on disk structures are packet. Are you sure? > > * > +#define gfs2_16_to_cpu be16_to_cpu > +#define gfs2_32_to_cpu be32_to_cpu > +#define gfs2_64_to_cpu be64_to_cpu > > why this pointless abstracting? > > * +static const uint32_t crc_32_tab[] = ..... > > why do you duplicate this? The kernel has a perfectly good set of generic crc32 tables/functions just fine > > * Why are you using bufferheads extensively in a new filesystem? > > * + if (create) > + down_write(&ip->i_rw_mutex); > + else > + down_read(&ip->i_rw_mutex); > > why do you use a rwsem and not a regular semaphore? You are aware that rwsems are far more expensive than regular ones right? > How skewed is the read/write ratio? > > * Why use your own journalling layer and not say ... jbd ? > > * + while (!kthread_should_stop()) { > + gfs2_scand_internal(sdp); > + > + set_current_state(TASK_INTERRUPTIBLE); > + schedule_timeout(gfs2_tune_get(sdp, gt_scand_secs) * HZ); > + } > > you probably really want to check for signals if you do interruptible sleeps > (multiple places) > > * why not use msleep() and friends instead of schedule_timeout(), you're not using the complex variants anyway > > * +++ b/fs/gfs2/fixed_div64.h 2005-08-01 14:13:08.009808200 +0800 > > ehhhh why? > > * int gfs2_copy2user(struct buffer_head *bh, char **buf, unsigned int offset, > + unsigned int size) > +{ > + int error; > + > + if (bh) > + error = copy_to_user(*buf, bh->b_data + offset, size); > + else > + error = clear_user(*buf, size); > > that looks to be missing a few kmaps.. whats the guarantee that b_data is actually, like in lowmem? > > * [PATCH 08/14] GFS: diaper device > > The diaper device is a block device within gfs that gets transparently > inserted between the real device the and rest of the filesystem. > > hmmmm why not use device mapper or something? Is this really needed? Should it live in drivers/block ? Doesn't > this wrapper just increase the risk for memory deadlocks? > > * [PATCH 06/14] GFS: logging and recovery > > quoting the ren and stimpy show is nice.. but did the ren ans stimpy authors agree to license their stuff under the GPL? > > > * do_lock_wait > > that almost screams for using wait_event and related APIs > > > * > +static inline void gfs2_log_lock(struct gfs2_sbd *sdp) > +{ > + spin_lock(&sdp->sd_log_lock); > +} > why the abstraction ? > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > > > > ------------------------------ > > Message: 5 > Date: Tue, 2 Aug 2005 13:16:53 +0300 > From: Pekka Enberg <penberg@xxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: David Teigland <teigland@xxxxxxxxxx> > Cc: akpm@xxxxxxxx, Pekka Enberg <penberg@xxxxxxxxxxxxxx>, > linux-cluster@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <84144f0205080203163cab015c@xxxxxxxxxxxxxx> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi David, > > On 8/2/05, David Teigland <teigland@xxxxxxxxxx> wrote: > > Hi, GFS (Global File System) is a cluster file system that we'd like to > > see added to the kernel. The 14 patches total about 900K so I won't send > > them to the list unless that's requested. Comments and suggestions are > > welcome. Thanks > > > +#define kmalloc_nofail(size, flags) \ > > + gmalloc_nofail((size), (flags), __FILE__, __LINE__) > > [snip] > > > +void *gmalloc_nofail_real(unsigned int size, int flags, char *file, > > + unsigned int line) > > +{ > > + void *x; > > + for (;;) { > > + x = kmalloc(size, flags); > > + if (x) > > + return x; > > + if (time_after_eq(jiffies, gfs2_malloc_warning + 5 * HZ)) { > > + printk("GFS2: out of memory: %s, %u\n", > > + __FILE__, __LINE__); > > + gfs2_malloc_warning = jiffies; > > + } > > + yield(); > > This does not belong in a filesystem. It also seems like a very bad > idea. What are you trying to do here? If you absolutely must not fail, > use __GFP_NOFAIL instead. > > > + } > > +} > > + > > +#if defined(GFS2_MEMORY_SIMPLE) > > + > > +atomic_t gfs2_memory_count; > > + > > +void gfs2_memory_add_i(void *data, char *file, unsigned int line) > > +{ > > + atomic_inc(&gfs2_memory_count); > > +} > > + > > +void gfs2_memory_rm_i(void *data, char *file, unsigned int line) > > +{ > > + if (data) > > + atomic_dec(&gfs2_memory_count); > > +} > > + > > +void *gmalloc(unsigned int size, int flags, char *file, unsigned int line) > > +{ > > + void *data = kmalloc(size, flags); > > + if (data) > > + atomic_inc(&gfs2_memory_count); > > + return data; > > +} > > + > > +void *gmalloc_nofail(unsigned int size, int flags, char *file, > > + unsigned int line) > > +{ > > + atomic_inc(&gfs2_memory_count); > > + return gmalloc_nofail_real(size, flags, file, line); > > +} > > + > > +void gfree(void *data, char *file, unsigned int line) > > +{ > > + if (data) { > > + atomic_dec(&gfs2_memory_count); > > + kfree(data); > > + } > > +} > > -mm has memory leak detection patches and there are others floating > around. Please do not introduce yet another subsystem-specific debug allocator. > > Pekka > > > > ------------------------------ > > Message: 6 > Date: Tue, 2 Aug 2005 16:57:11 +0200 (MEST) > From: Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, > linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <Pine.LNX.4.61.0508021655580.4138@xxxxxxxxxxxxxxx> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > >* Why use your own journalling layer and not say ... jbd ? > > Why does reiser use its own journalling layer and not say ... jbd ? > > > > Jan Engelhardt > -- > > > > ------------------------------ > > Message: 7 > Date: Tue, 02 Aug 2005 17:02:52 +0200 > From: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, > linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <1122994972.3247.31.camel@xxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain > > On Tue, 2005-08-02 at 16:57 +0200, Jan Engelhardt wrote: > > >* Why use your own journalling layer and not say ... jbd ? > > > > Why does reiser use its own journalling layer and not say ... jbd ? > > because reiser got merged before jbd. Next question. > > Now the question for GFS is still a valid one; there might be reasons to > not use it (which is fair enough) but if there's no real reason then > using jdb sounds a lot better given it's maturity (and it is used by 2 > filesystems in -mm already). > > > > > > ------------------------------ > > Message: 8 > Date: Tue, 02 Aug 2005 18:00:02 -0700 > From: Hans Reiser <reiser@xxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, Jan Engelhardt > <jengelh@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <42F01712.2030105@xxxxxxxxxxx> > Content-Type: text/plain; charset=ISO-8859-1 > > Arjan van de Ven wrote: > > >On Tue, 2005-08-02 at 16:57 +0200, Jan Engelhardt wrote: > > > > > >>>* Why use your own journalling layer and not say ... jbd ? > >>> > >>> > >>Why does reiser use its own journalling layer and not say ... jbd ? > >> > >> > > > >because reiser got merged before jbd. Next question. > > > > > That is the wrong reason. We use our own journaling layer for the > reason that Vivaldi used his own melody. > > I don't know anything about GFS, but expecting a filesystem author to > use a journaling layer he does not want to is a bit arrogant. Now, if > you got into details, and said jbd does X, Y and Z, and GFS does the > same X and Y, and does not do Z as well as jbd, that would be a more > serious comment. He might want to look at how reiser4 does wandering > logs instead of using jbd..... but I would never claim that for sure > some other author should be expected to use it..... and something like > changing one's journaling system is not something to do just before a > merge..... > > >Now the question for GFS is still a valid one; there might be reasons to > >not use it (which is fair enough) but if there's no real reason then > >using jdb sounds a lot better given it's maturity (and it is used by 2 > >filesystems in -mm already). > > > > > > > >- > >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >the body of a message to majordomo@xxxxxxxxxxxxxxx > >More majordomo info at http://vger.kernel.org/majordomo-info.html > >Please read the FAQ at http://www.tux.org/lkml/ > > > > > > > > > > > > ------------------------------ > > Message: 9 > Date: Wed, 3 Aug 2005 08:37:19 +0200 (MEST) > From: Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: Kyle Moffett <mrmacman_g4@xxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, Hans Reiser > <reiser@xxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Arjan van de Ven > <arjan@xxxxxxxxxxxxx> > Message-ID: <Pine.LNX.4.61.0508030826000.2263@xxxxxxxxxxxxxxx> > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > >> > because reiser got merged before jbd. Next question. > >> > >> That is the wrong reason. We use our own journaling layer for the > >> reason that Vivaldi used his own melody. > >> > >> [...] He might want to look at how reiser4 does wandering > >> logs instead of using jbd..... but I would never claim that for sure > >> some other author should be expected to use it..... and something like > >> changing one's journaling system is not something to do just before a > >> merge..... > > > > Do you see my point here? If every person who added new kernel code > > just wrote their own thing without checking to see if it had already > > been done before, then there would be a lot of poorly maintained code > > in the kernel. If a journalling layer already exists, _new_ journaled > > filesystems should either (A) use the layer as is, or (B) fix the layer > > so it has sufficient functionality for them to use, and submit patches. > > Maybe jbd 'sucks' for something 'cool' like reiser*, and modifying jbd to be > 'eleet enough' for reiser* would overwhelm ext. > > Lastly, there is the 'political' thing, when a <your-favorite-jbd-fs>-only > specific change to jbd is rejected by all other jbd-using fs. (Basically the > situation thing that leads to software forks, in any area.) > > > > Jan Engelhardt > -- > > > > ------------------------------ > > Message: 10 > Date: Wed, 3 Aug 2005 09:44:06 +0300 > From: Pekka Enberg <penberg@xxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: David Teigland <teigland@xxxxxxxxxx> > Cc: akpm@xxxxxxxx, Pekka Enberg <penberg@xxxxxxxxxxxxxx>, > linux-cluster@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <84144f0205080223445375c907@xxxxxxxxxxxxxx> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi David, > > Some more comments below. > > Pekka > > On 8/2/05, David Teigland <teigland@xxxxxxxxxx> wrote: > > +/** > > + * inode_create - create a struct gfs2_inode > > + * @i_gl: The glock covering the inode > > + * @inum: The inode number > > + * @io_gl: the iopen glock to acquire/hold (using holder in new gfs2_inode) > > + * @io_state: the state the iopen glock should be acquired in > > + * @ipp: pointer to put the returned inode in > > + * > > + * Returns: errno > > + */ > > + > > +static int inode_create(struct gfs2_glock *i_gl, struct gfs2_inum *inum, > > + struct gfs2_glock *io_gl, unsigned int io_state, > > + struct gfs2_inode **ipp) > > +{ > > + struct gfs2_sbd *sdp = i_gl->gl_sbd; > > + struct gfs2_inode *ip; > > + int error = 0; > > + > > + RETRY_MALLOC(ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL), ip); > > Why do you want to do this? The callers can handle ENOMEM just fine. > > > +/** > > + * gfs2_random - Generate a random 32-bit number > > + * > > + * Generate a semi-crappy 32-bit pseudo-random number without using > > + * floating point. > > + * > > + * The PRNG is from "Numerical Recipes in C" (second edition), page 284. > > + * > > + * Returns: a 32-bit random number > > + */ > > + > > +uint32_t gfs2_random(void) > > +{ > > + gfs2_random_number = 0x0019660D * gfs2_random_number + 0x3C6EF35F; > > + return gfs2_random_number; > > +} > > Please consider moving this into lib/random.c. This one already appears in > drivers/net/hamradio/dmascc.c. > > > +/** > > + * gfs2_hash - hash an array of data > > + * @data: the data to be hashed > > + * @len: the length of data to be hashed > > + * > > + * Take some data and convert it to a 32-bit hash. > > + * > > + * This is the 32-bit FNV-1a hash from: > > + * http://www.isthe.com/chongo/tech/comp/fnv/ > > + * > > + * Returns: the hash > > + */ > > + > > +uint32_t gfs2_hash(const void *data, unsigned int len) > > +{ > > + uint32_t h = 0x811C9DC5; > > + h = hash_more_internal(data, len, h); > > + return h; > > +} > > Is there a reason why you cannot use <linux/hash.h> or <linux/jhash.h>? > > > +void gfs2_sort(void *base, unsigned int num_elem, unsigned int size, > > + int (*compar) (const void *, const void *)) > > +{ > > + register char *pbase = (char *)base; > > + int i, j, k, h; > > + static int cols[16] = {1391376, 463792, 198768, 86961, > > + 33936, 13776, 4592, 1968, > > + 861, 336, 112, 48, > > + 21, 7, 3, 1}; > > + > > + for (k = 0; k < 16; k++) { > > + h = cols[k]; > > + for (i = h; i < num_elem; i++) { > > + j = i; > > + while (j >= h && > > + (*compar)((void *)(pbase + size * (j - h)), > > + (void *)(pbase + size * j)) > 0) { > > + SWAP(pbase + size * j, > > + pbase + size * (j - h), > > + size); > > + j = j - h; > > + } > > + } > > + } > > +} > > Please use sort() from lib/sort.c. > > > +/** > > + * gfs2_io_error_inode_i - Flag an inode I/O error and withdraw > > + * @ip: > > + * @function: > > + * @file: > > + * @line: > > Please drop empty kerneldoc tags. (Appears in various other places as well.) > > > +#define RETRY_MALLOC(do_this, until_this) \ > > +for (;;) { \ > > + { do_this; } \ > > + if (until_this) \ > > + break; \ > > + if (time_after_eq(jiffies, gfs2_malloc_warning + 5 * HZ)) { \ > > + printk("GFS2: out of memory: %s, %u\n", __FILE__, __LINE__); \ > > + gfs2_malloc_warning = jiffies; \ > > + } \ > > + yield(); \ > > +} > > Please drop this. > > > +int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip) > > +{ > > + struct gfs2_sbd *sdp = dip->i_sbd; > > + struct posix_acl *acl = NULL; > > + struct gfs2_ea_request er; > > + mode_t mode = ip->i_di.di_mode; > > + int error; > > + > > + if (!sdp->sd_args.ar_posix_acl) > > + return 0; > > + if (S_ISLNK(ip->i_di.di_mode)) > > + return 0; > > + > > + memset(&er, 0, sizeof(struct gfs2_ea_request)); > > + er.er_type = GFS2_EATYPE_SYS; > > + > > + error = acl_get(dip, ACL_DEFAULT, &acl, NULL, > > + &er.er_data, &er.er_data_len); > > + if (error) > > + return error; > > + if (!acl) { > > + mode &= ~current->fs->umask; > > + if (mode != ip->i_di.di_mode) > > + error = munge_mode(ip, mode); > > + return error; > > + } > > + > > + { > > + struct posix_acl *clone = posix_acl_clone(acl, GFP_KERNEL); > > + error = -ENOMEM; > > + if (!clone) > > + goto out; > > + gfs2_memory_add(clone); > > + gfs2_memory_rm(acl); > > + posix_acl_release(acl); > > + acl = clone; > > + } > > Please make this a real function. It is duplicated below. > > > + if (error > 0) { > > + er.er_name = GFS2_POSIX_ACL_ACCESS; > > + er.er_name_len = GFS2_POSIX_ACL_ACCESS_LEN; > > + posix_acl_to_xattr(acl, er.er_data, er.er_data_len); > > + er.er_mode = mode; > > + er.er_flags = GFS2_ERF_MODE; > > + error = gfs2_system_eaops.eo_set(ip, &er); > > + if (error) > > + goto out; > > + } else > > + munge_mode(ip, mode); > > + > > + out: > > + gfs2_memory_rm(acl); > > + posix_acl_release(acl); > > + kfree(er.er_data); > > + > > + return error; > > Whitespace damage. > > > +int gfs2_acl_chmod(struct gfs2_inode *ip, struct iattr *attr) > > +{ > > + struct posix_acl *acl = NULL; > > + struct gfs2_ea_location el; > > + char *data; > > + unsigned int len; > > + int error; > > + > > + error = acl_get(ip, ACL_ACCESS, &acl, &el, &data, &len); > > + if (error) > > + return error; > > + if (!acl) > > + return gfs2_setattr_simple(ip, attr); > > + > > + { > > + struct posix_acl *clone = posix_acl_clone(acl, GFP_KERNEL); > > + error = -ENOMEM; > > + if (!clone) > > + goto out; > > + gfs2_memory_add(clone); > > + gfs2_memory_rm(acl); > > + posix_acl_release(acl); > > + acl = clone; > > + } > > Duplicated above. > > > +static int ea_foreach(struct gfs2_inode *ip, ea_call_t ea_call, void *data) > > +{ > > + struct buffer_head *bh; > > + int error; > > + > > + error = gfs2_meta_read(ip->i_gl, ip->i_di.di_eattr, > > + DIO_START | DIO_WAIT, &bh); > > + if (error) > > + return error; > > + > > + if (!(ip->i_di.di_flags & GFS2_DIF_EA_INDIRECT)) > > + error = ea_foreach_i(ip, bh, ea_call, data); > > goto out here so you can drop the else branch below. > > > + else { > > + struct buffer_head *eabh; > > + uint64_t *eablk, *end; > > + > > + if (gfs2_metatype_check(ip->i_sbd, bh, GFS2_METATYPE_IN)) { > > + error = -EIO; > > + goto out; > > + } > > + > > + eablk = (uint64_t *)(bh->b_data + > > + sizeof(struct gfs2_meta_header)); > > + end = eablk + ip->i_sbd->sd_inptrs; > > + > > > +static int ea_find_i(struct gfs2_inode *ip, struct buffer_head *bh, > > + struct gfs2_ea_header *ea, struct gfs2_ea_header *prev, > > + void *private) > > +{ > > + struct ea_find *ef = (struct ea_find *)private; > > + struct gfs2_ea_request *er = ef->ef_er; > > + > > + if (ea->ea_type == GFS2_EATYPE_UNUSED) > > + return 0; > > + > > + if (ea->ea_type == er->er_type) { > > + if (ea->ea_name_len == er->er_name_len && > > + !memcmp(GFS2_EA2NAME(ea), er->er_name, ea->ea_name_len)) { > > + struct gfs2_ea_location *el = ef->ef_el; > > + get_bh(bh); > > + el->el_bh = bh; > > + el->el_ea = ea; > > + el->el_prev = prev; > > + return 1; > > + } > > + } > > + > > +#if 0 > > + else if ((ip->i_di.di_flags & GFS2_DIF_EA_PACKED) && > > + er->er_type == GFS2_EATYPE_SYS) > > + return 1; > > +#endif > > Please drop commented out code. > > > +static int ea_list_i(struct gfs2_inode *ip, struct buffer_head *bh, > > + struct gfs2_ea_header *ea, struct gfs2_ea_header *prev, > > + void *private) > > +{ > > + struct ea_list *ei = (struct ea_list *)private; > > Please drop redundant cast. > > > +static int ea_set_i(struct gfs2_inode *ip, struct gfs2_ea_request *er, > > + struct gfs2_ea_location *el) > > +{ > > + { > > + struct ea_set es; > > + int error; > > + > > + memset(&es, 0, sizeof(struct ea_set)); > > + es.es_er = er; > > + es.es_el = el; > > + > > + error = ea_foreach(ip, ea_set_simple, &es); > > + if (error > 0) > > + return 0; > > + if (error) > > + return error; > > + } > > + { > > + unsigned int blks = 2; > > + if (!(ip->i_di.di_flags & GFS2_DIF_EA_INDIRECT)) > > + blks++; > > + if (GFS2_EAREQ_SIZE_STUFFED(er) > ip->i_sbd->sd_jbsize) > > + blks += DIV_RU(er->er_data_len, > > + ip->i_sbd->sd_jbsize); > > + > > + return ea_alloc_skeleton(ip, er, blks, ea_set_block, el); > > + } > > Please drop the extra braces. > > > > ------------------------------ > > Message: 11 > Date: Wed, 3 Aug 2005 00:07:38 -0400 > From: Kyle Moffett <mrmacman_g4@xxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: Hans Reiser <reiser@xxxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, Jan Engelhardt > <jengelh@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, Arjan van de > Ven <arjan@xxxxxxxxxxxxx> > Message-ID: <4CBCB111-36B9-4F8C-9A3F-A9126ADE1CA2@xxxxxxx> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > > On Aug 2, 2005, at 21:00:02, Hans Reiser wrote: > > Arjan van de Ven wrote: > >> because reiser got merged before jbd. Next question. > > That is the wrong reason. We use our own journaling layer for the > > reason that Vivaldi used his own melody. > > > > I don't know anything about GFS, but expecting a filesystem author to > > use a journaling layer he does not want to is a bit arrogant. Now, if > > you got into details, and said jbd does X, Y and Z, and GFS does the > > same X and Y, and does not do Z as well as jbd, that would be a more > > serious comment. He might want to look at how reiser4 does wandering > > logs instead of using jbd..... but I would never claim that for sure > > some other author should be expected to use it..... and something > > like > > changing one's journaling system is not something to do just before a > > merge..... > > I don't want to start another big reiser4 flamewar, but... > > "I don't know anything about Reiser4, but expecting a filesystem author > to use a VFS layer he does not want to is a bit arrogant. Now, if you > got into details, and said the linux VFS does X, Y, and Z, and Reiser4 > does..." > > Do you see my point here? If every person who added new kernel code > just wrote their own thing without checking to see if it had already > been done before, then there would be a lot of poorly maintained code > in the kernel. If a journalling layer already exists, _new_ journaled > filesystems should either (A) use the layer as is, or (B) fix the layer > so it has sufficient functionality for them to use, and submit patches. > That way if somebody later says, "Ah, crap, there's a bug in the kernel > journalling layer", and fixes it, there are not eight other filesystems > with their own open-coded layers that need to be audited for similar > mistakes. > > This is similar to why some kernel developers did not like the Reiser4 > code, because it implemented some private layers that looked kinda like > stuff the VFS should be doing (Again, I don't want to get into that > argument again, I'm just bringing up the similarities to clarify _this_ > particular point, as that one has been beaten to death enough already). > > >> Now the question for GFS is still a valid one; there might be > >> reasons to > >> not use it (which is fair enough) but if there's no real reason then > >> using jdb sounds a lot better given it's maturity (and it is used > >> by 2 > >> filesystems in -mm already). > > Personally, I am of the opinion that if GFS cannot use jdb, the > developers > ought to clarify why it isn't useable, and possibly submit fixes to make > it useful, so that others can share the benefits. > > Cheers, > Kyle Moffett > > -- > I lost interest in "blade servers" when I found they didn't throw > knives at > people who weren't supposed to be in your machine room. > -- Anthony de Boer > > > > > ------------------------------ > > Message: 12 > Date: Wed, 03 Aug 2005 11:09:02 +0200 > From: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: Hans Reiser <reiser@xxxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, Jan Engelhardt > <jengelh@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <1123060142.3363.8.camel@xxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain > > > > I don't know anything about GFS, but expecting a filesystem author to > > use a journaling layer he does not want to is a bit arrogant. > > good that I didn't expect that then. > I think it's fair enough to ask people if they can use it. If the answer > is "No because it doesn't fit our model <here>" then that's fine. If the > answer is "eh yeah we could" then I think it's entirely reasonable to > expect people to use common code as opposed to adding new code. > > > > > ------------------------------ > > Message: 13 > Date: Wed, 03 Aug 2005 11:17:09 +0200 > From: Arjan van de Ven <arjan@xxxxxxxxxxxxx> > Subject: Re: [PATCH 00/14] GFS > To: David Teigland <teigland@xxxxxxxxxx> > Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, > linux-kernel@xxxxxxxxxxxxxxx > Message-ID: <1123060630.3363.10.camel@xxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain > > On Wed, 2005-08-03 at 11:56 +0800, David Teigland wrote: > > The point is you can define GFS2_ENDIAN_BIG to compile gfs to be BE > > on-disk instead of LE which is another useful way to verify endian > > correctness. > > that sounds wrong to be a compile option. If you really want to deal > with dual disk endianness it really ought to be a runtime one (see jffs2 > for example). > > > > > > * Why use your own journalling layer and not say ... jbd ? > > > > Here's an analysis of three approaches to cluster-fs journaling and their > > pros/cons (including using jbd): http://tinyurl.com/7sbqq > > > > > * + while (!kthread_should_stop()) { > > > + gfs2_scand_internal(sdp); > > > + > > > + set_current_state(TASK_INTERRUPTIBLE); > > > + schedule_timeout(gfs2_tune_get(sdp, gt_scand_secs) * HZ); > > > + } > > > > > > you probably really want to check for signals if you do interruptible sleeps > > > > I don't know why we'd be interested in signals here. > > well.. because if you don't your schedule_timeout becomes a nop when you > get one, which makes your loop a busy waiting one. > > > > > > ------------------------------ > > -- > > Linux-cluster@xxxxxxxxxx > http://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 16, Issue 4 > ******************************************** > -- travellig. -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster