Re: GlusterFS was removed from Fedora EPEL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Keithley,

Please find the bug in the attached log file. I experienced this bug on both 3.4.0 and 3.4.1. There is no problem with GlusterFS 3.2.7.

I use IOR 3.0.1 for the test. The connection is IBoIP. OFED is come from Mellanox (1.5.3). OS is CentOS 6.4.

About the GlusterFS on RHEL 6.5, I wonder that why there is no glusterfs-server and glusterfs-geo-replication packages?


On Tue, Dec 3, 2013 at 1:03 AM, Kaleb S. KEITHLEY <kkeithle@xxxxxxxxxx> wrote:
On 12/02/2013 10:52 AM, Nguyen Viet Cuong wrote:
Hi,

Actually, I have very bad experience with GlusterFS 3.3.x and 3.4.x
under very high pressure (> 64 processes write in parallel in more than
10 minutes, for example).

Have you filed a bug?


GlusterFS 3.2.7 from EPEL is really stable and
we use it for production.

Unfortunately, there is no official built of GlusterFS 3.2.x on
Gluster's repo.

You can get the glusterfs RPMs that were built in the Fedora Koji build system at

    https://koji.fedoraproject.org/koji/packageinfo?packageID=5443

and in particular the 3.2.7 el6 RPMs are at

    https://koji.fedoraproject.org/koji/buildinfo?buildID=323952


--

Kaleb



--
Nguyen Viet Cuong
IOR-3.0.1: MPI Coordinated Test of Parallel I/O

Began: Thu Nov 28 20:10:06 2013
Command line used: /opt/IOR/bin/IOR -a POSIX -F -m -t 1M -b 1G -s 2 -i 5 -k -r -R -w -W -e -o /mnt/IOR
Machine: Linux cn01.local

Test 0 started: Thu Nov 28 20:10:06 2013
Summary:
	api                = POSIX
	test filename      = /mnt/IOR
	access             = file-per-process
	ordering in a file = sequential offsets
	ordering inter file= no tasks offsets
	clients            = 128 (8 per node)
	repetitions        = 5
	xfersize           = 1 MiB
	blocksize          = 1 GiB
	aggregate filesize = 256 GiB

access    bw(MiB/s)  block(KiB) xfer(KiB)  open(s)    wr/rd(s)   close(s)   total(s)   iter
------    ---------  ---------- ---------  --------   --------   --------   --------   ----
write     584.86     1048576    1024.00    43.68      448.19     11.28      448.22     0   
read      885.50     1048576    1024.00    0.383256   295.97     0.016442   296.04     0   
write     573.92     1048576    1024.00    60.53      456.69     11.56      456.76     1   
read      956.56     1048576    1024.00    0.321138   274.03     0.031678   274.05     1   
write     590.64     1048576    1024.00    65.95      443.77     11.66      443.83     2   
read      1254.45    1048576    1024.00    0.383224   208.95     0.049748   208.97     2   
write     585.15     1048576    1024.00    67.88      447.94     12.07      447.99     3   
read      1298.00    1048576    1024.00    0.364708   201.94     0.067993   201.96     3   
WARNING: Task 43, partial write(), 131072 of 1048576 bytes at offset 389021696
WARNING: Task 107, partial write(), 524288 of 1048576 bytes at offset 176160768
WARNING: Task 11, partial write(), 393216 of 1048576 bytes at offset 39845888
ior ERROR: write() failed, errno 107, Transport endpoint is not connected (aiori-POSIX.c:236)
ior ERROR: write() failed, errno 107, Transport endpoint is not connected (aiori-POSIX.c:236)
ior ERROR: write() failed, errno 107, Transport endpoint is not connected (aiori-POSIX.c:236)
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 43 in communicator MPI_COMM_WORLD 
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
--------------------------------------------------------------------------
WARNING: A process refused to die!

Host: cn04.local
PID:  2808

This process may still be running and/or consuming resources.

--------------------------------------------------------------------------
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
[fs.local:17668] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[fs.local:17668] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[fs.local:17668] 1 more process has sent help message help-odls-default.txt / odls-default:could-not-kill
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
--------------------------------------------------------------------------
mpirun has exited due to process rank 107 with PID 2725 on
node lustre04.local exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
[fs.local:17668] 1 more process has sent help message help-odls-default.txt / odls-default:could-not-kill
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux