Re: explanation of errors

"Amar S. Tumballi" <amar@xxxxxxxxxxxxx> · Mon, 17 Mar 2008 10:22:04 -0700

These errors were not coming over tcp? This entry case = 3 come only when
the file is present at more than 2 storage nodes. (not afr'd volumes
ofcourse).

Anyways, let us lookinto it. About ib-verbs, I am still trying to figure out
what may be the issue. Which version are you using now btw?

Regards,
Amar

On Mon, Mar 17, 2008 at 10:07 AM, Mickey Mazarick <mic@xxxxxxxxxxxxxxxxxx>
wrote:

> They are separate, I meant to imply that there is a Storage-01ns ->
> Storage-02ns ->Storage-03ns.
> The only thing I'm not doing is double-mirroring the afr volumes.
> (ie there is no Storage-01 -> Storage-02afr unified to Storage-01afr ->
> Storage-02)
> I never really understood the reason for doing this in the examples, but
> assumed it would help throughput.
>
> my spec is included below.
>
>
> ####gluster-syster.vol#####
> volume main1
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST201
>  option remote-subvolume system
> end-volume
> volume main2
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST202
>  option remote-subvolume system
> end-volume
> volume main3
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST203
>  option remote-subvolume system
> end-volume
> volume main4
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST204
>  option remote-subvolume system
> end-volume
> volume main5
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST205
>  option remote-subvolume system
> end-volume
> volume main6
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST206
>  option remote-subvolume system
> end-volume
>
> volume main1-2
>  type cluster/afr
>  subvolumes main1 main2
> #  option replicate *:2
> end-volume
> volume main3-4
>  type cluster/afr
>  subvolumes main3 main4
> #  option replicate *:2
> end-volume
> volume main5-6
>  type cluster/afr
>  subvolumes main5 main6
> #  option replicate *:2
> end-volume
>
>
> volume main-ns-1
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST201
>  option remote-subvolume system-ns
> end-volume
> volume main-ns-2
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST202
>  option remote-subvolume system-ns
> end-volume
> volume main-ns-3
>  type protocol/client
>  option transport-type ib-verbs/client
>  option remote-host RTPST203
>  option remote-subvolume system-ns
> end-volume
>
> volume main-ns
>  type cluster/afr
>  subvolumes main-ns-1 main-ns-2 main-ns-3
> #  option replicate *:3
> end-volume
>
>
> volume main
>  type cluster/unify
>  option namespace main-ns
>  subvolumes main1-2 main3-4 main5-6
>  option scheduler alu   # use the ALU scheduler
> #  option alu.limits.min-free-disk  10GB   # Don't create files one a
> volume with less than 60GB free diskspace
> #  option alu.limits.max-open-files 10000   # Don't create files on a
> volume with more than 10000 files open
>  # When deciding where to place a file, first look at the disk-usage,
> then at
>  # read-usage, write-usage, open files, and finally the disk-speed-usage.
>  option alu.order
> disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
> #  option alu.disk-usage.entry-threshold 2GB   # Kick in if the
> discrepancy in disk-usage between volumes is 2GB
> # option alu.disk-usage.exit-threshold  60MB   # Don't stop until you've
> written at least 60MB to the least-used volume
> #  option alu.open-files-usage.entry-threshold 1024   # Kick in if the
> discrepancy in open files is 1024
> # option alu.open-files-usage.exit-threshold 32   # Don't stop until
> you've written at least 32 files to the least-used volume
> # option alu.read-usage.entry-threshold 20%   # Kick in when the
> read-usage discrepancy is 20%
> # option alu.read-usage.exit-threshold 4%   # Don't stop until the
> discrepancy has been reduced with 4%
> # option alu.write-usage.entry-threshold 20%   # Kick in when the
> write-usage discrepancy is 20%
> # option alu.write-usage.exit-threshold 4%   # Don't stop until the
> discrepancy has been reduced with 4%
>  option alu.stat-refresh.interval 60sec   # Refresh the statistics used
> for decision-making every 10 seconds
> # option alu.stat-refresh.num-file-create 10   # Refresh the statistics
> used for decision-making after creating 10 files
> end-volume
>
> volume writebehind
>  type performance/write-behind
>  subvolumes main
> end-volume
>
> volume readahead
>  type performance/read-ahead
>  subvolumes writebehind
> end-volume
>
> volume io-cache
>  type performance/io-cache
>  subvolumes readahead
> end-volume
>
>
>
> ### If you are not concerned about performance of interactive commands
> ### like "ls -l", you wouldn't need this translator.
> #volume statprefetch
> #  type performance/stat-prefetch
> #  option cache-seconds 2  # cache expires in 2 seconds
> #  subvolumes readahead    # add "stat-prefetch" feature to "readahead"
> volume
> #end-volume
>
>
>
> Basavanagowda Kanur wrote:
> > Mickey,
> >    You cannot re-use the namespace as storage volume.
> >    Make sure you have seperate namespaces, other than the ones in
> > storage for glusterfs to work properly.
> >
> > --
> > Gowda
> >
> > On Mon, Mar 17, 2008 at 10:10 PM, Mickey Mazarick
> > <mic@xxxxxxxxxxxxxxxxxx <mailto:mic@xxxxxxxxxxxxxxxxxx>> wrote:
> >
> >     I'm getting a lot of errors on an AFR/unify setup with 6 storage
> >     bricks
> >     using ib-verbs and just want some help understanding what is
> critical.
> >     for some reason this setup is very unstable and we want to know how
> to
> >     make it as robust as the architecture suggests it should be.
> >
> >     The problem is that when we copy any files we get hundreds of the
> >     following three errors in the client:
> >     2008-03-17 12:31:00 E [fuse-bridge.c:699:fuse_fd_cbk]
> glusterfs-fuse:
> >     38: /tftpboot/node_root/lib/modules/2.6.24.1/modules.symbols => -1
> (5)
> >     2008-03-17 12:31:00 E [unify.c:850:unify_open] main:
> >
> /tftpboot/node_root/lib/modules/2.6.24.1/kernel/arch/x86/kernel/cpuid.ko:
> >     entry_count is 3
> >     2008-03-17 12:31:00 E [unify.c:853:unify_open] main:
> >
> /tftpboot/node_root/lib/modules/2.6.24.1/kernel/arch/x86/kernel/cpuid.ko:
> >     found on main-ns
> >
> >     Files still copy with these errors but very slowly.
> >     Additionally we are unable to lose even one storage brick without
> the
> >     cluster freezing.
> >
> >
> >     We have the pretty common afr/unify setup with 6 storage bricks.
> >
> >     namespace:
> >     Storage_01 <- AFR -> RTPST202 <-AFR-> Storage_03
> >
> >     storage:
> >     Storage_01 <- AFR -> Storage_02
> >     Storage_03 <- AFR -> Storage_04
> >     Storage_05 <- AFR -> Storage_06
> >
> >     All this is running on TLA ver 703 with a the latest patched fuse
> >     module.
> >
> >     Any suggestions would be appreciated!
> >     Thanks!
> >     -Mickey Mazarick
> >
> >     --
> >
> >
> >     _______________________________________________
> >     Gluster-devel mailing list
> >     Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
> >     http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
> >
> >
>
>
> --
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

-- 
Amar Tumballi
Gluster/GlusterFS Hacker
[bulde on #gluster/irc.gnu.org]
http://www.zresearch.com - Commoditizing Supercomputing and Superstorage!