Re: ceph-disk activate fails (after 33 osd drives)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



John,

> 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal FileJournal::_open: unable to setup io_context (0) Success

Try increasing aio-max-nr:

echo 131072 > /proc/sys/fs/aio-max-nr

Best regards,
      Alexey


On Fri, Feb 12, 2016 at 4:51 PM, John Hogenmiller (yt) <john@xxxxxxxxxxx> wrote:
>
>
> I have 7 servers, each containing 60 x 6TB drives in jbod mode. When I first
> started, I only activated a couple drives on 3 nodes as Ceph OSDs.
> Yesterday, I went to expand to the remaining nodes as well as prepare and
> activate all the drives.
>
> ceph-disk prepare worked just fine. However, ceph-disk activate-all managed
> to only activate 33 drives and failed on the rest.  This is consistent all 7
> nodes (existing and newly installed). At the end of the day, I have 33 Ceph
> OSDs activated per server and can't activate any more. I did have to bump up
> the pg_num and pgp_num on the pool in order to accommodate the drives that
> did activate. I don't know if having a low pg number during the mass influx
> of OSDs caused an issue or not within the pool. I don't think so because I
> can only set the pg_num to a maximum value determined by the number of known
> OSDs. But maybe you have to expand slowly, increase pg's, expand osds,
> increase pgs in a slow fashion.  I certainly have not seen anything to
> suggest a magic "33/node limit", and I've seen references to servers with up
> to 72 Ceph OSDs on them.
>
> I then attempted to activate individual ceph osd's and got the same set of
> errors. I even wiped a drive, re-ran `ceph-disk prepare` and `ceph-disk
> activate` to have it fail in the same way.
>
> status:
> ```
> root@ljb01:/home/ceph/rain-cluster# ceph status
>     cluster 4ebe7995-6a33-42be-bd4d-20f51d02ae45
>      health HEALTH_OK
>      monmap e5: 5 mons at
> {hail02-r01-06=172.29.4.153:6789/0,hail02-r01-08=172.29.4.155:6789/0,rain02-r01-01=172.29.4.148:6789/0,rain02-r01-03=172.29.4.150:6789/0,rain02-r01-04=172.29.4.151:6789/0}
>             election epoch 12, quorum 0,1,2,3,4
> rain02-r01-01,rain02-r01-03,rain02-r01-04,hail02-r01-06,hail02-r01-08
>      osdmap e1116: 420 osds: 232 up, 232 in
>             flags sortbitwise
>       pgmap v397198: 10872 pgs, 14 pools, 101 MB data, 8456 objects
>             38666 MB used, 1264 TB / 1264 TB avail
>                10872 active+clean
> ```
>
>
>
> Here is what I get when I run ceph-disk prepare on a blank drive:
>
> ```
> root@rain02-r01-01:/etc/ceph# ceph-disk  prepare  /dev/sdbh1
> The operation has completed successfully.
> The operation has completed successfully.
> meta-data=/dev/sdbh1             isize=2048   agcount=6, agsize=268435455
> blks
>          =                       sectsz=512   attr=2, projid32bit=0
> data     =                       bsize=4096   blocks=1463819665, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> The operation has completed successfully.
>
> root@rain02-r01-01:/etc/ceph# parted /dev/sdh print
> Model: ATA HUS726060ALA640 (scsi)
> Disk /dev/sdh: 6001GB
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
>
> Number  Start   End     Size    File system  Name          Flags
>  2      1049kB  5369MB  5368MB               ceph journal
>  1      5370MB  6001GB  5996GB  xfs          ceph data
> ```
>
> And finally the errors from attempting to activate the drive.
>
> ```
> root@rain02-r01-01:/etc/ceph# ceph-disk activate /dev/sdbh1
> got monmap epoch 5
> 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal FileJournal::_open:
> unable to setup io_context (0) Success
> 2016-02-12 12:53:43.340748 7f1493f83700 -1 journal io_submit to 0~4096 got
> (22) Invalid argument
> 2016-02-12 12:53:43.341186 7f149bc71940 -1
> filestore(/var/lib/ceph/tmp/mnt.KRphD_) could not find
> -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
> os/FileJournal.cc: In function 'int FileJournal::write_aio_bl(off64_t&,
> ceph::bufferlist&, uint64_t)' thread 7f1493f83700 time 2016-02-12
> 12:53:43.341355
> os/FileJournal.cc: 1469: FAILED assert(0 == "io_submit got unexpected
> error")
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0x7f149b767f2b]
>  2: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> long)+0x5ad) [0x7f149b5fe27d]
>  3: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
>  4: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
>  5: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
>  6: (()+0x8182) [0x7f1499d87182]
>  7: (clone()+0x6d) [0x7f14980ce47d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
> 2016-02-12 12:53:43.345434 7f1493f83700 -1 os/FileJournal.cc: In function
> 'int FileJournal::write_aio_bl(off64_t&, ceph::bufferlist&, uint64_t)'
> thread 7f1493f83700 time 2016-02-12 12:53:43.341355
> os/FileJournal.cc: 1469: FAILED assert(0 == "io_submit got unexpected
> error")
>
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0x7f149b767f2b]
>  2: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> long)+0x5ad) [0x7f149b5fe27d]
>  3: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
>  4: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
>  5: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
>  6: (()+0x8182) [0x7f1499d87182]
>  7: (clone()+0x6d) [0x7f14980ce47d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>     -4> 2016-02-12 12:53:43.340526 7f149bc71940 -1 journal
> FileJournal::_open: unable to setup io_context (0) Success
>     -3> 2016-02-12 12:53:43.340748 7f1493f83700 -1 journal io_submit to
> 0~4096 got (22) Invalid argument
>     -1> 2016-02-12 12:53:43.341186 7f149bc71940 -1
> filestore(/var/lib/ceph/tmp/mnt.KRphD_) could not find
> -1/23c2fcde/osd_superblock/0 in index: (2) No such file or directory
>      0> 2016-02-12 12:53:43.345434 7f1493f83700 -1 os/FileJournal.cc: In
> function 'int FileJournal::write_aio_bl(off64_t&, ceph::bufferlist&,
> uint64_t)' thread 7f1493f83700 time 2016-02-12 12:53:43.3
> 41355
> os/FileJournal.cc: 1469: FAILED assert(0 == "io_submit got unexpected
> error”)
>
>
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0x7f149b767f2b]
>  2: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> long)+0x5ad) [0x7f149b5fe27d]
>  3: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
>  4: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
>  5: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
>  6: (()+0x8182) [0x7f1499d87182]
>  7: (clone()+0x6d) [0x7f14980ce47d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> terminate called after throwing an instance of 'ceph::FailedAssertion'
> *** Caught signal (Aborted) **
>  in thread 7f1493f83700
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (()+0x7d02ca) [0x7f149b67b2ca]
>  2: (()+0x10340) [0x7f1499d8f340]
>  3: (gsignal()+0x39) [0x7f149800acc9]
>  4: (abort()+0x148) [0x7f149800e0d8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1498915535]
>  6: (()+0x5e6d6) [0x7f14989136d6]
>  7: (()+0x5e703) [0x7f1498913703]
>  8: (()+0x5e922) [0x7f1498913922]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x278) [0x7f149b768118]
>  10: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> long)+0x5ad) [0x7f149b5fe27d]
>  11: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
>  12: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
>  13: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
>  14: (()+0x8182) [0x7f1499d87182]
>  15: (clone()+0x6d) [0x7f14980ce47d]
> 2016-02-12 12:53:43.348498 7f1493f83700 -1 *** Caught signal (Aborted) **
>  in thread 7f1493f83700
>
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (()+0x7d02ca) [0x7f149b67b2ca]
>  2: (()+0x10340) [0x7f1499d8f340]
>  3: (gsignal()+0x39) [0x7f149800acc9]
>  4: (abort()+0x148) [0x7f149800e0d8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1498915535]
>  6: (()+0x5e6d6) [0x7f14989136d6]
>  7: (()+0x5e703) [0x7f1498913703]
>  8: (()+0x5e922) [0x7f1498913922]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x278) [0x7f149b768118]
>  10: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> long)+0x5ad) [0x7f149b5fe27d]
>  11: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
>  12: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394]
>  13: (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
>  14: (()+0x8182) [0x7f1499d87182]
>  15: (clone()+0x6d) [0x7f14980ce47d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>      0> 2016-02-12 12:53:43.348498 7f1493f83700 -1 *** Caught signal
> (Aborted) **
>  in thread 7f1493f83700
>
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (()+0x7d02ca) [0x7f149b67b2ca]
>  2: (()+0x10340) [0x7f1499d8f340]
>  3: (gsignal()+0x39) [0x7f149800acc9]
>  4: (abort()+0x148) [0x7f149800e0d8]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1498915535]
>  6: (()+0x5e6d6) [0x7f14989136d6]
>  7: (()+0x5e703) [0x7f1498913703]
>  8: (()+0x5e922) [0x7f1498913922]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x278) [0x7f149b768118]
>  10: (FileJournal::write_aio_bl(long&, ceph::buffer::list&, unsigned
> long)+0x5ad) [0x7f149b5fe27d]
>  11: (FileJournal::do_aio_write(ceph::buffer::list&)+0x263) [0x7f149b602e63]
> 12: (FileJournal::write_thread_entry()+0x4e4) [0x7f149b607394] 13:
> (FileJournal::Writer::entry()+0xd) [0x7f149b44bddd]
>  14: (()+0x8182) [0x7f1499d87182]
>  15: (clone()+0x6d) [0x7f14980ce47d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> ERROR:ceph-disk:Failed to activate
> Traceback (most recent call last):
>   File "/usr/sbin/ceph-disk", line 3576, in <module>
>     main(sys.argv[1:])
>   File "/usr/sbin/ceph-disk", line 3532, in main
>     main_catch(args.func, args)
>   File "/usr/sbin/ceph-disk", line 3554, in main_catch
>     func(args)
>   File "/usr/sbin/ceph-disk", line 2424, in main_activate
>     dmcrypt_key_dir=args.dmcrypt_key_dir,
>   File "/usr/sbin/ceph-disk", line 2197, in mount_activate
>     (osd_id, cluster) = activate(path, activate_key_template, init)
>   File "/usr/sbin/ceph-disk", line 2360, in activate
>     keyring=keyring,
>   File "/usr/sbin/ceph-disk", line 1950, in mkfs
>     '--setgroup', get_ceph_user(),  File "/usr/sbin/ceph-disk", line 349, in
> command_check_call    return subprocess.check_call(arguments)
>   File "/usr/lib/python2.7/subprocess.py", line 540, in check_call    raise
> CalledProcessError(retcode, cmd)subprocess.CalledProcessError: Command
> '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i',
> '165', '--monmap', '/var/lib/ceph/tmp/mnt.KRphD_/activate.monmap',
> '--osd-data', '/var/lib/ceph/tmp/mnt.KRphD_', '--osd-journal',
> '/var/lib/ceph/tmp/mnt.K
>
> root@rain02-r01-01:/etc/ceph# ls -l /var/lib/ceph/tmp/
> total 0
> -rw-r--r-- 1 root root 0 Feb 12 12:58 ceph-disk.activate.lock
> -rw-r--r-- 1 root root 0 Feb 12 12:58 ceph-disk.prepare.lock
> ```
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux