Re: Linux-cluster Digest, Vol 16, Issue 4

travellig travellig <travellig@xxxxxxxxx> · Wed, 3 Aug 2005 16:57:45 +0100

On Wed, 2005-08-03 at 11:58 +0000, "Sævaldur Arnar Gunnarsson
[Hugsmiðjan]" wrote:
> What does "automatic" fencing have to offer that the manual fencing
> lacks.
Automatic fencing uses hardware to fence a node and reboot it.
Manual fencing relay on you to manually fence the node whenever you
release there is a problem in the cluster and relays on you to
prowercycle the faulty node manually, no very convenient when you are
sysadmin the cluster remotely.

> If we decide to buy the FC switch right away is it recomended that we 
> buy one of the ones that have fencing agent available for the 
> Cluster-Suite ?
 If you look at the configuration manual for RHCS, there is a list of
supported fencing agents.
> If can't get our hands on supported FC switchs can we do fencing in 
> another manner than throught a FC switch ?

Manual fencing.

Nando

On 8/3/05, linux-cluster-request@xxxxxxxxxx
<linux-cluster-request@xxxxxxxxxx> wrote:
> Send  submissions to
>         linux-cluster@xxxxxxxxxx
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
>         linux-cluster-request@xxxxxxxxxx
> 
> You can reach the person managing the list at
>         linux-cluster-owner@xxxxxxxxxx
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: [PATCH 00/14] GFS (Lars Marowsky-Bree)
>    2. Fencing agents (S?valdur Arnar Gunnarsson [Hugsmi?jan])
>    3. Re: [PATCH 00/14] GFS (Arjan van de Ven)
>    4. Re: [PATCH 00/14] GFS (Arjan van de Ven)
>    5. Re: [PATCH 00/14] GFS (Pekka Enberg)
>    6. Re: [PATCH 00/14] GFS (Jan Engelhardt)
>    7. Re: [PATCH 00/14] GFS (Arjan van de Ven)
>    8. Re: [PATCH 00/14] GFS (Hans Reiser)
>    9. Re: [PATCH 00/14] GFS (Jan Engelhardt)
>   10. Re: [PATCH 00/14] GFS (Pekka Enberg)
>   11. Re: [PATCH 00/14] GFS (Kyle Moffett)
>   12. Re: [PATCH 00/14] GFS (Arjan van de Ven)
>   13. Re: [PATCH 00/14] GFS (Arjan van de Ven)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 3 Aug 2005 12:37:44 +0200
> From: Lars Marowsky-Bree <lmb@xxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: David Teigland <teigland@xxxxxxxxxx>,       Arjan van de Ven
>         <arjan@xxxxxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,
>         linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <20050803103744.GG11081@xxxxxxxxxxxxxxxx>
> Content-Type: text/plain; charset=us-ascii
> 
> On 2005-08-03T11:56:18, David Teigland <teigland@xxxxxxxxxx> wrote:
> 
> > > * Why use your own journalling layer and not say ... jbd ?
> > Here's an analysis of three approaches to cluster-fs journaling and their
> > pros/cons (including using jbd):  http://tinyurl.com/7sbqq
> 
> Very instructive read, thanks for the link.
> 
> 
> 
> --
> High Availability & Clustering
> SUSE Labs, Research and Development
> SUSE LINUX Products GmbH - A Novell Business     -- Charles Darwin
> "Ignorance more frequently begets confidence than does knowledge"
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 03 Aug 2005 11:58:47 +0000
> From: "S?valdur Arnar Gunnarsson [Hugsmi?jan]" <addi@xxxxxxxxxxxxx>
> Subject:  Fencing agents
> To: linux-cluster@xxxxxxxxxx
> Message-ID: <42F0B177.7050907@xxxxxxxxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> I'm implementing a shared storage between multiple (2 at the moment)
> Blade machines (Dell PowerEdge 1855) running RHEL4 ES connected to a EMC
> AX100 through FC.
> 
> The SAN has two FC ports so the need for a FC Switch has not yet come
> however we will add other Blades in the coming months.
> The one thing I haven't got figured out with GFS and the Cluster-Suite
> is the whole idea about fencing.
> 
> We have a working setup using Centos rebuilds of the Cluster-Suite and
> GFS (http://rpm.karan.org/el4/csgfs/) which we are not planning to use
> in the final implementation where we plan to use the official GFS
> packages from Red Hat.
> The fencing agents in that setup is manual fencing.
> 
> Both machines have the file system mounted and there appears to be no
> problems.
> 
> What does "automatic" fencing have to offer that the manual fencing lacks.
> If we decide to buy the FC switch right away is it recomended that we
> buy one of the ones that have fencing agent available for the
> Cluster-Suite ?
> 
> If can't get our hands on supported FC switchs can we do fencing in
> another manner than throught a FC switch ?
> 
> 
> 
> 
> --
> Sævaldur Gunnarsson :: Hugsmiðjan
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 02 Aug 2005 09:45:24 +0200
> From: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: David Teigland <teigland@xxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,
>         linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <1122968724.3247.22.camel@xxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain
> 
> On Tue, 2005-08-02 at 15:18 +0800, David Teigland wrote:
> > Hi, GFS (Global File System) is a cluster file system that we'd like to
> > see added to the kernel.  The 14 patches total about 900K so I won't send
> > them to the list unless that's requested.  Comments and suggestions are
> > welcome.  Thanks
> >
> > http://redhat.com/~teigland/gfs2/20050801/gfs2-full.patch
> > http://redhat.com/~teigland/gfs2/20050801/broken-out/
> 
> * The on disk structures are defined in terms of uint32_t and friends,
> which are NOT endian neutral. Why are they not le32/be32 and thus
> endian-defined? Did you run bitwise-sparse on GFS yet ?
> 
> * None of your on disk structures are packet. Are you sure?
> 
> *
> +#define gfs2_16_to_cpu be16_to_cpu
> +#define gfs2_32_to_cpu be32_to_cpu
> +#define gfs2_64_to_cpu be64_to_cpu
> 
> why this pointless abstracting?
> 
> * +static const uint32_t crc_32_tab[] = .....
> 
> why do you duplicate this? The kernel has a perfectly good set of generic crc32 tables/functions just fine
> 
> * Why are you using bufferheads extensively in a new filesystem?
> 
> * +     if (create)
> +               down_write(&ip->i_rw_mutex);
> +       else
> +               down_read(&ip->i_rw_mutex);
> 
> why do you use a rwsem and not a regular semaphore? You are aware that rwsems are far more expensive than regular ones right?
> How skewed is the read/write ratio?
> 
> * Why use your own journalling layer and not say ... jbd ?
> 
> * +     while (!kthread_should_stop()) {
> +               gfs2_scand_internal(sdp);
> +
> +               set_current_state(TASK_INTERRUPTIBLE);
> +               schedule_timeout(gfs2_tune_get(sdp, gt_scand_secs) * HZ);
> +       }
> 
> you probably really want to check for signals if you do interruptible sleeps
> (multiple places)
> 
> * why not use msleep() and friends instead of schedule_timeout(), you're not using the complex variants anyway
> 
> * +++ b/fs/gfs2/fixed_div64.h   2005-08-01 14:13:08.009808200 +0800
> 
> ehhhh why?
> 
> * int gfs2_copy2user(struct buffer_head *bh, char **buf, unsigned int offset,
> +                  unsigned int size)
> +{
> +       int error;
> +
> +       if (bh)
> +               error = copy_to_user(*buf, bh->b_data + offset, size);
> +       else
> +               error = clear_user(*buf, size);
> 
> that looks to be missing a few kmaps.. whats the guarantee that b_data is actually, like in lowmem?
> 
> * [PATCH 08/14] GFS: diaper device
> 
> The diaper device is a block device within gfs that gets transparently
> inserted between the real device the and rest of the filesystem.
> 
> hmmmm why not use device mapper or something? Is this really needed? Should it live in drivers/block ? Doesn't
> this wrapper just increase the risk for memory deadlocks?
> 
> * [PATCH 06/14] GFS: logging and recovery
> 
> quoting the ren and stimpy show is nice.. but did the ren ans stimpy authors agree to license their stuff under the GPL?
> 
> 
> * do_lock_wait
> 
> that almost screams for using wait_event and related APIs
> 
> 
> *
> +static inline void gfs2_log_lock(struct gfs2_sbd *sdp)
> +{
> +       spin_lock(&sdp->sd_log_lock);
> +}
> why the abstraction ?
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Tue, 02 Aug 2005 09:45:24 +0200
> From: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: David Teigland <teigland@xxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,
>         linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <1122968724.3247.22.camel@xxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain
> 
> On Tue, 2005-08-02 at 15:18 +0800, David Teigland wrote:
> > Hi, GFS (Global File System) is a cluster file system that we'd like to
> > see added to the kernel.  The 14 patches total about 900K so I won't send
> > them to the list unless that's requested.  Comments and suggestions are
> > welcome.  Thanks
> >
> > http://redhat.com/~teigland/gfs2/20050801/gfs2-full.patch
> > http://redhat.com/~teigland/gfs2/20050801/broken-out/
> 
> * The on disk structures are defined in terms of uint32_t and friends,
> which are NOT endian neutral. Why are they not le32/be32 and thus
> endian-defined? Did you run bitwise-sparse on GFS yet ?
> 
> * None of your on disk structures are packet. Are you sure?
> 
> *
> +#define gfs2_16_to_cpu be16_to_cpu
> +#define gfs2_32_to_cpu be32_to_cpu
> +#define gfs2_64_to_cpu be64_to_cpu
> 
> why this pointless abstracting?
> 
> * +static const uint32_t crc_32_tab[] = .....
> 
> why do you duplicate this? The kernel has a perfectly good set of generic crc32 tables/functions just fine
> 
> * Why are you using bufferheads extensively in a new filesystem?
> 
> * +     if (create)
> +               down_write(&ip->i_rw_mutex);
> +       else
> +               down_read(&ip->i_rw_mutex);
> 
> why do you use a rwsem and not a regular semaphore? You are aware that rwsems are far more expensive than regular ones right?
> How skewed is the read/write ratio?
> 
> * Why use your own journalling layer and not say ... jbd ?
> 
> * +     while (!kthread_should_stop()) {
> +               gfs2_scand_internal(sdp);
> +
> +               set_current_state(TASK_INTERRUPTIBLE);
> +               schedule_timeout(gfs2_tune_get(sdp, gt_scand_secs) * HZ);
> +       }
> 
> you probably really want to check for signals if you do interruptible sleeps
> (multiple places)
> 
> * why not use msleep() and friends instead of schedule_timeout(), you're not using the complex variants anyway
> 
> * +++ b/fs/gfs2/fixed_div64.h   2005-08-01 14:13:08.009808200 +0800
> 
> ehhhh why?
> 
> * int gfs2_copy2user(struct buffer_head *bh, char **buf, unsigned int offset,
> +                  unsigned int size)
> +{
> +       int error;
> +
> +       if (bh)
> +               error = copy_to_user(*buf, bh->b_data + offset, size);
> +       else
> +               error = clear_user(*buf, size);
> 
> that looks to be missing a few kmaps.. whats the guarantee that b_data is actually, like in lowmem?
> 
> * [PATCH 08/14] GFS: diaper device
> 
> The diaper device is a block device within gfs that gets transparently
> inserted between the real device the and rest of the filesystem.
> 
> hmmmm why not use device mapper or something? Is this really needed? Should it live in drivers/block ? Doesn't
> this wrapper just increase the risk for memory deadlocks?
> 
> * [PATCH 06/14] GFS: logging and recovery
> 
> quoting the ren and stimpy show is nice.. but did the ren ans stimpy authors agree to license their stuff under the GPL?
> 
> 
> * do_lock_wait
> 
> that almost screams for using wait_event and related APIs
> 
> 
> *
> +static inline void gfs2_log_lock(struct gfs2_sbd *sdp)
> +{
> +       spin_lock(&sdp->sd_log_lock);
> +}
> why the abstraction ?
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Tue, 2 Aug 2005 13:16:53 +0300
> From: Pekka Enberg <penberg@xxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: David Teigland <teigland@xxxxxxxxxx>
> Cc: akpm@xxxxxxxx, Pekka Enberg <penberg@xxxxxxxxxxxxxx>,
>         linux-cluster@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <84144f0205080203163cab015c@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi David,
> 
> On 8/2/05, David Teigland <teigland@xxxxxxxxxx> wrote:
> > Hi, GFS (Global File System) is a cluster file system that we'd like to
> > see added to the kernel.  The 14 patches total about 900K so I won't send
> > them to the list unless that's requested.  Comments and suggestions are
> > welcome.  Thanks
> 
> > +#define kmalloc_nofail(size, flags) \
> > +     gmalloc_nofail((size), (flags), __FILE__, __LINE__)
> 
> [snip]
> 
> > +void *gmalloc_nofail_real(unsigned int size, int flags, char *file,
> > +                       unsigned int line)
> > +{
> > +     void *x;
> > +     for (;;) {
> > +             x = kmalloc(size, flags);
> > +             if (x)
> > +                     return x;
> > +             if (time_after_eq(jiffies, gfs2_malloc_warning + 5 * HZ)) {
> > +                     printk("GFS2: out of memory: %s, %u\n",
> > +                            __FILE__, __LINE__);
> > +                     gfs2_malloc_warning = jiffies;
> > +             }
> > +             yield();
> 
> This does not belong in a filesystem. It also seems like a very bad
> idea. What are you trying to do here? If you absolutely must not fail,
> use __GFP_NOFAIL instead.
> 
> > +     }
> > +}
> > +
> > +#if defined(GFS2_MEMORY_SIMPLE)
> > +
> > +atomic_t gfs2_memory_count;
> > +
> > +void gfs2_memory_add_i(void *data, char *file, unsigned int line)
> > +{
> > +     atomic_inc(&gfs2_memory_count);
> > +}
> > +
> > +void gfs2_memory_rm_i(void *data, char *file, unsigned int line)
> > +{
> > +     if (data)
> > +             atomic_dec(&gfs2_memory_count);
> > +}
> > +
> > +void *gmalloc(unsigned int size, int flags, char *file, unsigned int line)
> > +{
> > +     void *data = kmalloc(size, flags);
> > +     if (data)
> > +             atomic_inc(&gfs2_memory_count);
> > +     return data;
> > +}
> > +
> > +void *gmalloc_nofail(unsigned int size, int flags, char *file,
> > +                  unsigned int line)
> > +{
> > +     atomic_inc(&gfs2_memory_count);
> > +     return gmalloc_nofail_real(size, flags, file, line);
> > +}
> > +
> > +void gfree(void *data, char *file, unsigned int line)
> > +{
> > +     if (data) {
> > +             atomic_dec(&gfs2_memory_count);
> > +             kfree(data);
> > +     }
> > +}
> 
> -mm has memory leak detection patches and there are others floating
> around. Please do not introduce yet another subsystem-specific debug allocator.
> 
>                                     Pekka
> 
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Tue, 2 Aug 2005 16:57:11 +0200 (MEST)
> From: Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,
>         linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <Pine.LNX.4.61.0508021655580.4138@xxxxxxxxxxxxxxx>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> 
> 
> >* Why use your own journalling layer and not say ... jbd ?
> 
> Why does reiser use its own journalling layer and not say ... jbd ?
> 
> 
> 
> Jan Engelhardt
> --
> 
> 
> 
> ------------------------------
> 
> Message: 7
> Date: Tue, 02 Aug 2005 17:02:52 +0200
> From: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,
>         linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <1122994972.3247.31.camel@xxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain
> 
> On Tue, 2005-08-02 at 16:57 +0200, Jan Engelhardt wrote:
> > >* Why use your own journalling layer and not say ... jbd ?
> >
> > Why does reiser use its own journalling layer and not say ... jbd ?
> 
> because reiser got merged before jbd. Next question.
> 
> Now the question for GFS is still a valid one; there might be reasons to
> not use it (which is fair enough) but if there's no real reason then
> using jdb sounds a lot better given it's maturity (and it is used by 2
> filesystems in -mm already).
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 8
> Date: Tue, 02 Aug 2005 18:00:02 -0700
> From: Hans Reiser <reiser@xxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,    Jan Engelhardt
>         <jengelh@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <42F01712.2030105@xxxxxxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Arjan van de Ven wrote:
> 
> >On Tue, 2005-08-02 at 16:57 +0200, Jan Engelhardt wrote:
> >
> >
> >>>* Why use your own journalling layer and not say ... jbd ?
> >>>
> >>>
> >>Why does reiser use its own journalling layer and not say ... jbd ?
> >>
> >>
> >
> >because reiser got merged before jbd. Next question.
> >
> >
> That is the wrong reason.  We use our own journaling layer for the
> reason that Vivaldi used his own melody.
> 
> I don't know anything about GFS, but expecting a filesystem author to
> use a journaling layer he does not want to is a bit arrogant.  Now, if
> you got into details, and said jbd does X, Y and Z, and GFS does the
> same X and Y, and does not do Z as well as jbd, that would be a more
> serious comment.  He might want to look at how reiser4 does wandering
> logs instead of using jbd..... but I would never claim that for sure
> some other author should be expected to use it.....  and something like
> changing one's journaling system is not something to do just before a
> merge.....
> 
> >Now the question for GFS is still a valid one; there might be reasons to
> >not use it (which is fair enough) but if there's no real reason then
> >using jdb sounds a lot better given it's maturity (and it is used by 2
> >filesystems in -mm already).
> >
> >
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to majordomo@xxxxxxxxxxxxxxx
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/
> >
> >
> >
> >
> 
> 
> 
> ------------------------------
> 
> Message: 9
> Date: Wed, 3 Aug 2005 08:37:19 +0200 (MEST)
> From: Jan Engelhardt <jengelh@xxxxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: Kyle Moffett <mrmacman_g4@xxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx, Hans Reiser
>         <reiser@xxxxxxxxxxx>,   linux-kernel@xxxxxxxxxxxxxxx, Arjan van de Ven
>         <arjan@xxxxxxxxxxxxx>
> Message-ID: <Pine.LNX.4.61.0508030826000.2263@xxxxxxxxxxxxxxx>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
> 
> 
> >> > because reiser got merged before jbd. Next question.
> >>
> >> That is the wrong reason.  We use our own journaling layer for the
> >> reason that Vivaldi used his own melody.
> >>
> >> [...] He might want to look at how reiser4 does wandering
> >> logs instead of using jbd..... but I would never claim that for sure
> >> some other author should be expected to use it.....  and something like
> >> changing one's journaling system is not something to do just before a
> >> merge.....
> >
> > Do you see my point here?  If every person who added new kernel code
> > just wrote their own thing without checking to see if it had already
> > been done before, then there would be a lot of poorly maintained code
> > in the kernel.  If a journalling layer already exists, _new_ journaled
> > filesystems should either (A) use the layer as is, or (B) fix the layer
> > so it has sufficient functionality for them to use, and submit patches.
> 
> Maybe jbd 'sucks' for something 'cool' like reiser*, and modifying jbd to be
> 'eleet enough' for reiser* would overwhelm ext.
> 
> Lastly, there is the 'political' thing, when a <your-favorite-jbd-fs>-only
> specific change to jbd is rejected by all other jbd-using fs. (Basically the
> situation thing that leads to software forks, in any area.)
> 
> 
> 
> Jan Engelhardt
> --
> 
> 
> 
> ------------------------------
> 
> Message: 10
> Date: Wed, 3 Aug 2005 09:44:06 +0300
> From: Pekka Enberg <penberg@xxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: David Teigland <teigland@xxxxxxxxxx>
> Cc: akpm@xxxxxxxx, Pekka Enberg <penberg@xxxxxxxxxxxxxx>,
>         linux-cluster@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <84144f0205080223445375c907@xxxxxxxxxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi David,
> 
> Some more comments below.
> 
>                                 Pekka
> 
> On 8/2/05, David Teigland <teigland@xxxxxxxxxx> wrote:
> > +/**
> > + * inode_create - create a struct gfs2_inode
> > + * @i_gl: The glock covering the inode
> > + * @inum: The inode number
> > + * @io_gl: the iopen glock to acquire/hold (using holder in new gfs2_inode)
> > + * @io_state: the state the iopen glock should be acquired in
> > + * @ipp: pointer to put the returned inode in
> > + *
> > + * Returns: errno
> > + */
> > +
> > +static int inode_create(struct gfs2_glock *i_gl, struct gfs2_inum *inum,
> > +                     struct gfs2_glock *io_gl, unsigned int io_state,
> > +                     struct gfs2_inode **ipp)
> > +{
> > +     struct gfs2_sbd *sdp = i_gl->gl_sbd;
> > +     struct gfs2_inode *ip;
> > +     int error = 0;
> > +
> > +     RETRY_MALLOC(ip = kmem_cache_alloc(gfs2_inode_cachep, GFP_KERNEL), ip);
> 
> Why do you want to do this? The callers can handle ENOMEM just fine.
> 
> > +/**
> > + * gfs2_random - Generate a random 32-bit number
> > + *
> > + * Generate a semi-crappy 32-bit pseudo-random number without using
> > + * floating point.
> > + *
> > + * The PRNG is from "Numerical Recipes in C" (second edition), page 284.
> > + *
> > + * Returns: a 32-bit random number
> > + */
> > +
> > +uint32_t gfs2_random(void)
> > +{
> > +     gfs2_random_number = 0x0019660D * gfs2_random_number + 0x3C6EF35F;
> > +     return gfs2_random_number;
> > +}
> 
> Please consider moving this into lib/random.c. This one already appears in
> drivers/net/hamradio/dmascc.c.
> 
> > +/**
> > + * gfs2_hash - hash an array of data
> > + * @data: the data to be hashed
> > + * @len: the length of data to be hashed
> > + *
> > + * Take some data and convert it to a 32-bit hash.
> > + *
> > + * This is the 32-bit FNV-1a hash from:
> > + * http://www.isthe.com/chongo/tech/comp/fnv/
> > + *
> > + * Returns: the hash
> > + */
> > +
> > +uint32_t gfs2_hash(const void *data, unsigned int len)
> > +{
> > +     uint32_t h = 0x811C9DC5;
> > +     h = hash_more_internal(data, len, h);
> > +     return h;
> > +}
> 
> Is there a reason why you cannot use <linux/hash.h> or <linux/jhash.h>?
> 
> > +void gfs2_sort(void *base, unsigned int num_elem, unsigned int size,
> > +            int (*compar) (const void *, const void *))
> > +{
> > +     register char *pbase = (char *)base;
> > +     int i, j, k, h;
> > +     static int cols[16] = {1391376, 463792, 198768, 86961,
> > +                            33936, 13776, 4592, 1968,
> > +                            861, 336, 112, 48,
> > +                            21, 7, 3, 1};
> > +
> > +     for (k = 0; k < 16; k++) {
> > +             h = cols[k];
> > +             for (i = h; i < num_elem; i++) {
> > +                     j = i;
> > +                     while (j >= h &&
> > +                            (*compar)((void *)(pbase + size * (j - h)),
> > +                                      (void *)(pbase + size * j)) > 0) {
> > +                             SWAP(pbase + size * j,
> > +                                  pbase + size * (j - h),
> > +                                  size);
> > +                             j = j - h;
> > +                     }
> > +             }
> > +     }
> > +}
> 
> Please use sort() from lib/sort.c.
> 
> > +/**
> > + * gfs2_io_error_inode_i - Flag an inode I/O error and withdraw
> > + * @ip:
> > + * @function:
> > + * @file:
> > + * @line:
> 
> Please drop empty kerneldoc tags. (Appears in various other places as well.)
> 
> > +#define RETRY_MALLOC(do_this, until_this) \
> > +for (;;) { \
> > +     { do_this; } \
> > +     if (until_this) \
> > +             break; \
> > +     if (time_after_eq(jiffies, gfs2_malloc_warning + 5 * HZ)) { \
> > +             printk("GFS2: out of memory: %s, %u\n", __FILE__, __LINE__); \
> > +             gfs2_malloc_warning = jiffies; \
> > +     } \
> > +     yield(); \
> > +}
> 
> Please drop this.
> 
> > +int gfs2_acl_create(struct gfs2_inode *dip, struct gfs2_inode *ip)
> > +{
> > +             struct gfs2_sbd *sdp = dip->i_sbd;
> > +     struct posix_acl *acl = NULL;
> > +     struct gfs2_ea_request er;
> > +     mode_t mode = ip->i_di.di_mode;
> > +     int error;
> > +
> > +     if (!sdp->sd_args.ar_posix_acl)
> > +             return 0;
> > +     if (S_ISLNK(ip->i_di.di_mode))
> > +             return 0;
> > +
> > +     memset(&er, 0, sizeof(struct gfs2_ea_request));
> > +     er.er_type = GFS2_EATYPE_SYS;
> > +
> > +     error = acl_get(dip, ACL_DEFAULT, &acl, NULL,
> > +                     &er.er_data, &er.er_data_len);
> > +     if (error)
> > +             return error;
> > +     if (!acl) {
> > +             mode &= ~current->fs->umask;
> > +             if (mode != ip->i_di.di_mode)
> > +                     error = munge_mode(ip, mode);
> > +             return error;
> > +     }
> > +
> > +     {
> > +             struct posix_acl *clone = posix_acl_clone(acl, GFP_KERNEL);
> > +             error = -ENOMEM;
> > +             if (!clone)
> > +                     goto out;
> > +             gfs2_memory_add(clone);
> > +             gfs2_memory_rm(acl);
> > +             posix_acl_release(acl);
> > +             acl = clone;
> > +     }
> 
> Please make this a real function. It is duplicated below.
> 
> > +     if (error > 0) {
> > +             er.er_name = GFS2_POSIX_ACL_ACCESS;
> > +             er.er_name_len = GFS2_POSIX_ACL_ACCESS_LEN;
> > +             posix_acl_to_xattr(acl, er.er_data, er.er_data_len);
> > +             er.er_mode = mode;
> > +             er.er_flags = GFS2_ERF_MODE;
> > +             error = gfs2_system_eaops.eo_set(ip, &er);
> > +             if (error)
> > +                     goto out;
> > +     } else
> > +             munge_mode(ip, mode);
> > +
> > + out:
> > +     gfs2_memory_rm(acl);
> > +     posix_acl_release(acl);
> > +     kfree(er.er_data);
> > +
> > +             return error;
> 
> Whitespace damage.
> 
> > +int gfs2_acl_chmod(struct gfs2_inode *ip, struct iattr *attr)
> > +{
> > +     struct posix_acl *acl = NULL;
> > +     struct gfs2_ea_location el;
> > +     char *data;
> > +     unsigned int len;
> > +     int error;
> > +
> > +     error = acl_get(ip, ACL_ACCESS, &acl, &el, &data, &len);
> > +     if (error)
> > +             return error;
> > +     if (!acl)
> > +             return gfs2_setattr_simple(ip, attr);
> > +
> > +     {
> > +             struct posix_acl *clone = posix_acl_clone(acl, GFP_KERNEL);
> > +             error = -ENOMEM;
> > +             if (!clone)
> > +                     goto out;
> > +             gfs2_memory_add(clone);
> > +             gfs2_memory_rm(acl);
> > +             posix_acl_release(acl);
> > +             acl = clone;
> > +     }
> 
> Duplicated above.
> 
> > +static int ea_foreach(struct gfs2_inode *ip, ea_call_t ea_call, void *data)
> > +{
> > +     struct buffer_head *bh;
> > +     int error;
> > +
> > +     error = gfs2_meta_read(ip->i_gl, ip->i_di.di_eattr,
> > +                            DIO_START | DIO_WAIT, &bh);
> > +     if (error)
> > +             return error;
> > +
> > +     if (!(ip->i_di.di_flags & GFS2_DIF_EA_INDIRECT))
> > +             error = ea_foreach_i(ip, bh, ea_call, data);
> 
> goto out here so you can drop the else branch below.
> 
> > +     else {
> > +             struct buffer_head *eabh;
> > +             uint64_t *eablk, *end;
> > +
> > +             if (gfs2_metatype_check(ip->i_sbd, bh, GFS2_METATYPE_IN)) {
> > +                     error = -EIO;
> > +                     goto out;
> > +             }
> > +
> > +             eablk = (uint64_t *)(bh->b_data +
> > +                                  sizeof(struct gfs2_meta_header));
> > +             end = eablk + ip->i_sbd->sd_inptrs;
> > +
> 
> > +static int ea_find_i(struct gfs2_inode *ip, struct buffer_head *bh,
> > +                  struct gfs2_ea_header *ea, struct gfs2_ea_header *prev,
> > +                  void *private)
> > +{
> > +     struct ea_find *ef = (struct ea_find *)private;
> > +     struct gfs2_ea_request *er = ef->ef_er;
> > +
> > +     if (ea->ea_type == GFS2_EATYPE_UNUSED)
> > +             return 0;
> > +
> > +     if (ea->ea_type == er->er_type) {
> > +             if (ea->ea_name_len == er->er_name_len &&
> > +                 !memcmp(GFS2_EA2NAME(ea), er->er_name, ea->ea_name_len)) {
> > +                     struct gfs2_ea_location *el = ef->ef_el;
> > +                     get_bh(bh);
> > +                     el->el_bh = bh;
> > +                     el->el_ea = ea;
> > +                     el->el_prev = prev;
> > +                     return 1;
> > +             }
> > +     }
> > +
> > +#if 0
> > +     else if ((ip->i_di.di_flags & GFS2_DIF_EA_PACKED) &&
> > +              er->er_type == GFS2_EATYPE_SYS)
> > +             return 1;
> > +#endif
> 
> Please drop commented out code.
> 
> > +static int ea_list_i(struct gfs2_inode *ip, struct buffer_head *bh,
> > +                  struct gfs2_ea_header *ea, struct gfs2_ea_header *prev,
> > +                  void *private)
> > +{
> > +     struct ea_list *ei = (struct ea_list *)private;
> 
> Please drop redundant cast.
> 
> > +static int ea_set_i(struct gfs2_inode *ip, struct gfs2_ea_request *er,
> > +                 struct gfs2_ea_location *el)
> > +{
> > +     {
> > +             struct ea_set es;
> > +             int error;
> > +
> > +             memset(&es, 0, sizeof(struct ea_set));
> > +             es.es_er = er;
> > +             es.es_el = el;
> > +
> > +             error = ea_foreach(ip, ea_set_simple, &es);
> > +             if (error > 0)
> > +                     return 0;
> > +             if (error)
> > +                     return error;
> > +     }
> > +     {
> > +             unsigned int blks = 2;
> > +             if (!(ip->i_di.di_flags & GFS2_DIF_EA_INDIRECT))
> > +                     blks++;
> > +             if (GFS2_EAREQ_SIZE_STUFFED(er) > ip->i_sbd->sd_jbsize)
> > +                     blks += DIV_RU(er->er_data_len,
> > +                                    ip->i_sbd->sd_jbsize);
> > +
> > +             return ea_alloc_skeleton(ip, er, blks, ea_set_block, el);
> > +     }
> 
> Please drop the extra braces.
> 
> 
> 
> ------------------------------
> 
> Message: 11
> Date: Wed, 3 Aug 2005 00:07:38 -0400
> From: Kyle Moffett <mrmacman_g4@xxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: Hans Reiser <reiser@xxxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,    Jan Engelhardt
>         <jengelh@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx,        Arjan van de
>         Ven <arjan@xxxxxxxxxxxxx>
> Message-ID: <4CBCB111-36B9-4F8C-9A3F-A9126ADE1CA2@xxxxxxx>
> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
> 
> On Aug 2, 2005, at 21:00:02, Hans Reiser wrote:
> > Arjan van de Ven wrote:
> >> because reiser got merged before jbd. Next question.
> > That is the wrong reason.  We use our own journaling layer for the
> > reason that Vivaldi used his own melody.
> >
> > I don't know anything about GFS, but expecting a filesystem author to
> > use a journaling layer he does not want to is a bit arrogant.  Now, if
> > you got into details, and said jbd does X, Y and Z, and GFS does the
> > same X and Y, and does not do Z as well as jbd, that would be a more
> > serious comment.  He might want to look at how reiser4 does wandering
> > logs instead of using jbd..... but I would never claim that for sure
> > some other author should be expected to use it.....  and something
> > like
> > changing one's journaling system is not something to do just before a
> > merge.....
> 
> I don't want to start another big reiser4 flamewar, but...
> 
> "I don't know anything about Reiser4, but expecting a filesystem author
> to use a VFS layer he does not want to is a bit arrogant.  Now, if you
> got into details, and said the linux VFS does X, Y, and Z, and Reiser4
> does..."
> 
> Do you see my point here?  If every person who added new kernel code
> just wrote their own thing without checking to see if it had already
> been done before, then there would be a lot of poorly maintained code
> in the kernel.  If a journalling layer already exists, _new_ journaled
> filesystems should either (A) use the layer as is, or (B) fix the layer
> so it has sufficient functionality for them to use, and submit patches.
> That way if somebody later says, "Ah, crap, there's a bug in the kernel
> journalling layer", and fixes it, there are not eight other filesystems
> with their own open-coded layers that need to be audited for similar
> mistakes.
> 
> This is similar to why some kernel developers did not like the Reiser4
> code, because it implemented some private layers that looked kinda like
> stuff the VFS should be doing  (Again, I don't want to get into that
> argument again, I'm just bringing up the similarities to clarify _this_
> particular point, as that one has been beaten to death enough already).
> 
> >> Now the question for GFS is still a valid one; there might be
> >> reasons to
> >> not use it (which is fair enough) but if there's no real reason then
> >> using jdb sounds a lot better given it's maturity (and it is used
> >> by 2
> >> filesystems in -mm already).
> 
> Personally, I am of the opinion that if GFS cannot use jdb, the
> developers
> ought to clarify why it isn't useable, and possibly submit fixes to make
> it useful, so that others can share the benefits.
> 
> Cheers,
> Kyle Moffett
> 
> --
> I lost interest in "blade servers" when I found they didn't throw
> knives at
> people who weren't supposed to be in your machine room.
>    -- Anthony de Boer
> 
> 
> 
> 
> ------------------------------
> 
> Message: 12
> Date: Wed, 03 Aug 2005 11:09:02 +0200
> From: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: Hans Reiser <reiser@xxxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,    Jan Engelhardt
>         <jengelh@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <1123060142.3363.8.camel@xxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain
> 
> 
> > I don't know anything about GFS, but expecting a filesystem author to
> > use a journaling layer he does not want to is a bit arrogant.
> 
> good that I didn't expect that then.
> I think it's fair enough to ask people if they can use it. If the answer
> is "No because it doesn't fit our model <here>" then that's fine. If the
> answer is "eh yeah we could" then I think it's entirely reasonable to
> expect people to use common code as opposed to adding new code.
> 
> 
> 
> 
> ------------------------------
> 
> Message: 13
> Date: Wed, 03 Aug 2005 11:17:09 +0200
> From: Arjan van de Ven <arjan@xxxxxxxxxxxxx>
> Subject:  Re: [PATCH 00/14] GFS
> To: David Teigland <teigland@xxxxxxxxxx>
> Cc: akpm@xxxxxxxx, linux-cluster@xxxxxxxxxx,
>         linux-kernel@xxxxxxxxxxxxxxx
> Message-ID: <1123060630.3363.10.camel@xxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain
> 
> On Wed, 2005-08-03 at 11:56 +0800, David Teigland wrote:
> > The point is you can define GFS2_ENDIAN_BIG to compile gfs to be BE
> > on-disk instead of LE which is another useful way to verify endian
> > correctness.
> 
> that sounds wrong to be a compile option. If you really want to deal
> with dual disk endianness it really ought to be a runtime one (see jffs2
> for example).
> 
> 
> 
> > > * Why use your own journalling layer and not say ... jbd ?
> >
> > Here's an analysis of three approaches to cluster-fs journaling and their
> > pros/cons (including using jbd):  http://tinyurl.com/7sbqq
> >
> > > * + while (!kthread_should_stop()) {
> > > +           gfs2_scand_internal(sdp);
> > > +
> > > +           set_current_state(TASK_INTERRUPTIBLE);
> > > +           schedule_timeout(gfs2_tune_get(sdp, gt_scand_secs) * HZ);
> > > +   }
> > >
> > > you probably really want to check for signals if you do interruptible sleeps
> >
> > I don't know why we'd be interested in signals here.
> 
> well.. because if you don't your schedule_timeout becomes a nop when you
> get one, which makes your loop a busy waiting one.
> 
> 
> 
> 
> 
> ------------------------------
> 
> --
> 
> Linux-cluster@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> End of Linux-cluster Digest, Vol 16, Issue 4
> ********************************************
> 

-- 
travellig.

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster