Re: mkcephfs failing on v0.48 "argonaut"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Paul,

On Wed, 4 Jul 2012, Paul Pettigrew wrote:
> Firstly, well done guys on achieving this version milestone. I 
> successfully upgraded to the 0.48 format uneventfully on a live (test) 
> system.
> 
> The same system was then going through "rebuild" testing, to confirm 
> that also worked fine.
> 
> 
> Unfortunately, the mkcephfs command is failing:
> 
> root@dsanb1-coy:~# mkcephfs -c /etc/ceph/ceph.conf --allhosts --mkbtrfs -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v
> temp dir is /tmp/mkcephfs.GaRCZ9i06a
> preparing monmap in /tmp/mkcephfs.GaRCZ9i06a/monmap
> /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.GaRCZ9i06a/monmap
> /usr/bin/monmaptool: monmap file /tmp/mkcephfs.GaRCZ9i06a/monmap
> /usr/bin/monmaptool: generated fsid c7202495-468c-4678-b678-115c3ee33402
> epoch 0
> fsid c7202495-468c-4678-b678-115c3ee33402
> last_changed 2012-07-04 15:02:31.732275
> created 2012-07-04 15:02:31.732275
> 0: 10.32.0.10:6789/0 mon.alpha
> 1: 10.32.0.11:6789/0 mon.charlie
> 2: 10.32.0.25:6789/0 mon.bravo
> /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.GaRCZ9i06a/monmap (3 monitors)
> /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 "user"
> === osd.0 ===
> --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.GaRCZ9i06a --prepare-osdfs osd.0
> umount: /srv/osd.0: not mounted
> umount: /dev/disk/by-wwn/wwn-0x50014ee601246234: not mounted
> 
> WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL
> WARNING! - see http://btrfs.wiki.kernel.org before using
> 
> fs created label (null) on /dev/disk/by-wwn/wwn-0x50014ee601246234
>         nodesize 4096 leafsize 4096 sectorsize 4096 size 1.82TB
> Btrfs Btrfs v0.19
> Scanning for Btrfs filesystems
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        missing codepage or helper program, or other error
>        In some cases useful info is found in syslog - try
>        dmesg | tail  or so
> 
> failed: '/sbin/mkcephfs -d /tmp/mkcephfs.GaRCZ9i06a --prepare-osdfs osd.0'

Hmm.  Can you try running with -v?  That will tell us exactly which 
command it is running, and hopefully we can work backwards from there.

> dmesg/syslog is spitting out at the time of this failure:
> 
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.751945] device fsid 7de0d192-b710-4629-a201-849df1d9db17 devid 1 transid 27109 /dev/sdp
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.751987] device fsid 08fc3479-2fa2-4388-8b61-83e2a742a13e devid 1 transid 28699 /dev/sdo
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.752023] device fsid 8b4a7c43-1a05-4dcb-bbed-de2a5c933996 devid 1 transid 24346 /dev/sdn
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.752068] device fsid ba5fb1ca-c642-49b1-8a41-7f56f8e59fbd devid 1 transid 27274 /dev/sdm
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761453] device fsid 7fe8c5cf-bf8c-4276-90f2-c3f57f5275fb devid 1 transid 28724 /dev/sdi
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761518] device fsid 93fa3631-1202-4d42-8908-e5ef4d3e600d devid 1 transid 25201 /dev/sdh
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761579] device fsid b9a1b5e4-3e5e-4381-a29a-33470f4b870f devid 1 transid 23375 /dev/sdg
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761635] device fsid 280ea990-23f8-4c43-9e56-140c82340fdc devid 1 transid 25559 /dev/sdf
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761693] device fsid 2f724cde-6de5-4262-b195-1ba3eea2256e devid 1 transid 176 /dev/sde
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761732] device fsid a66f890f-8b08-4393-aab0-f222637ca5a4 devid 1 transid 7 /dev/sdd
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.761769] device fsid 6c181a94-697c-4e0c-af0d-05eb04d3626c devid 1 transid 7 /dev/sdc
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.775931] device fsid 6c181a94-697c-4e0c-af0d-05eb04d3626c devid 1 transid 7 /dev/sdc
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.779716] btrfs bad fsid on block 20971520
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.791594] btrfs bad fsid on block 20971520
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.803608] btrfs bad fsid on block 20971520
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.815541] btrfs bad fsid on block 20971520
> Jul  4 15:02:31 dsanb1-coy kernel: [ 2306.815878] btrfs bad fsid on block 20971520
> Jul  4 15:02:32 dsanb1-coy kernel: [ 2306.823554] btrfs bad fsid on block 20971520
> Jul  4 15:02:32 dsanb1-coy kernel: [ 2306.823797] btrfs bad fsid on block 20971520
> Jul  4 15:02:32 dsanb1-coy kernel: [ 2306.823887] btrfs: failed to read chunk root on sdc
> Jul  4 15:02:32 dsanb1-coy kernel: [ 2306.825622] btrfs: open_ctree failed

Long shot, but is the kernel on that machine recent?

> Also fails if not forcing to use btrfs, eg:
> 
> root@dsanb1-coy:~# mkcephfs -c /etc/ceph/ceph.conf --allhosts -k /etc/ceph/keyring --crushmapsrc crushfile.txt -v
> temp dir is /tmp/mkcephfs.ZOh6tBPAH0
> preparing monmap in /tmp/mkcephfs.ZOh6tBPAH0/monmap
> /usr/bin/monmaptool --create --clobber --add alpha 10.32.0.10:6789 --add bravo 10.32.0.25:6789 --add charlie 10.32.0.11:6789 --print /tmp/mkcephfs.ZOh6tBPAH0/monmap
> /usr/bin/monmaptool: monmap file /tmp/mkcephfs.ZOh6tBPAH0/monmap
> /usr/bin/monmaptool: generated fsid adb8d65c-a823-4dc2-9415-22b0d7252699
> epoch 0
> fsid adb8d65c-a823-4dc2-9415-22b0d7252699
> last_changed 2012-07-04 15:04:17.423368
> created 2012-07-04 15:04:17.423368
> 0: 10.32.0.10:6789/0 mon.alpha
> 1: 10.32.0.11:6789/0 mon.charlie
> 2: 10.32.0.25:6789/0 mon.bravo
> /usr/bin/monmaptool: writing epoch 0 to /tmp/mkcephfs.ZOh6tBPAH0/monmap (3 monitors)
> /usr/bin/ceph-conf -c /etc/ceph/ceph.conf -n osd.0 "user"
> === osd.0 ===
> --- dsanb1-coy# /sbin/mkcephfs -d /tmp/mkcephfs.ZOh6tBPAH0 --init-daemon osd.0
> 2012-07-04 15:04:17.789064 7fc7fadca780 -1 filestore(/srv/osd.0) limited size xattrs -- enable filestore_xattr_use_omap
> 2012-07-04 15:04:17.789120 7fc7fadca780 -1 OSD::mkfs: couldn't mount FileStore: error -95
> 2012-07-04 15:04:17.789161 7fc7fadca780 -1  ** ERROR: error creating empty object store in /srv/osd.0: (95) Operation not supported
> failed: '/sbin/mkcephfs -d /tmp/mkcephfs.ZOh6tBPAH0 --init-daemon osd.0'
> 
> 
> Confirming all this was working previously, and the crushmap, config 
> file, etc are all proven to be OK (get same failure when not specifying 
> a custom crushmap also). Also note that whilst the above is failing on 
> osd.0 creation, I have swapped disk references and still get the same 
> failure on different HDD's when they are hooked in as osd.0

The only thing that changed from v0.47 is the below.  Can you try 
replacing 'btrfs device scan || btrfsctl -a' with 'btrfs device scan ; 
btrfsctl -a'?  Maybe the btrfs tool isn't being pendantic about return 
codes...

sage


commit a414fd51c7c5ae5dbe9e3af7db6f17741a58c1a7
Author: Sage Weil <sage.weil@xxxxxxxxxxxxx>
Date:   Sat Feb 11 13:43:23 2012 -0800

    init-ceph, mkcephfs: try 'btrfs device scan' before 'btrfsctl -a'
    
    Fixes: #2023
    Reported-by: Wido den Hollander <wido@xxxxxxxxx>
    Signed-off-by: Sage Weil <sage.weil@xxxxxxxxxxxxx>

diff --git a/src/mkcephfs.in b/src/mkcephfs.in
index 83fb932..17b6014 100644
--- a/src/mkcephfs.in
+++ b/src/mkcephfs.in
@@ -332,7 +332,7 @@ if [ -n "$prepareosdfs" ]; then
 
     modprobe btrfs || true
     mkfs.btrfs $btrfs_devs
-    btrfsctl -a
+    btrfs device scan || btrfsctl -a
     mount -t btrfs $btrfs_opt $first_dev $btrfs_path
     chown $osd_user $btrfs_path
     chmod +w $btrfs_path


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux