Corruption by missing blocks

Bryan Stillwell <bstillwell@xxxxxxxxxxxxxxx> · Tue, 23 Apr 2013 12:38:30 -0600

I've run into an issue where after copying a file to my cephfs cluster
the md5sums no longer match.  I believe I've tracked it down to some
parts of the file which are missing:

$ obj_name=$(cephfs "title1.mkv" show_location -l 0 | grep object_name
| sed -e "s/.*:\W*\([0-9a-f]*\)\.[0-9a-f]*/\1/")
$ echo "Object name: $obj_name"
Object name: 10000001120

$ file_size=$(stat "title1.mkv" | grep Size | awk '{ print $2 }')
$ printf "File size: %d MiB (%d Bytes)\n" $(($file_size/1048576)) $file_size
File size: 20074 MiB (21049178117 Bytes)

$ blocks=$((file_size/4194304+1))
$ printf "Blocks: %d\n" $blocks
Blocks: 5019

$ for b in `seq 0 $(($blocks-1))`; do rados -p data stat
${obj_name}.`printf '%8.8x\n' $b` | grep "error"; done
 error stat-ing data/10000001120.00001076: No such file or directory
 error stat-ing data/10000001120.000011c7: No such file or directory
 error stat-ing data/10000001120.0000129c: No such file or directory
 error stat-ing data/10000001120.000012f4: No such file or directory
 error stat-ing data/10000001120.00001307: No such file or directory

Any ideas where to look to investigate what caused these blocks to not
be written?

Here's the current state of the cluster:

ceph -s
   health HEALTH_OK
   monmap e1: 1 mons at {a=172.24.88.50:6789/0}, election epoch 1, quorum 0 a
   osdmap e22059: 24 osds: 24 up, 24 in
    pgmap v1783615: 1920 pgs: 1917 active+clean, 3
active+clean+scrubbing+deep; 4667 GB data, 9381 GB used, 4210 GB /
13592 GB avail
   mdsmap e437: 1/1/1 up {0=a=up:active}

Here's my current crushmap:

# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20
device 21 osd.21
device 22 osd.22
device 23 osd.23

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool

# buckets
host b1 {
        id -2           # do not change unnecessarily
        # weight 2.980
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 0.500
        item osd.1 weight 0.500
        item osd.2 weight 0.500
        item osd.3 weight 0.500
        item osd.4 weight 0.500
        item osd.20 weight 0.480
}
host b2 {
        id -4           # do not change unnecessarily
        # weight 4.680
        alg straw
        hash 0  # rjenkins1
        item osd.5 weight 0.500
        item osd.6 weight 0.500
        item osd.7 weight 2.200
        item osd.8 weight 0.500
        item osd.9 weight 0.500
        item osd.21 weight 0.480
}
host b3 {
        id -5           # do not change unnecessarily
        # weight 3.480
        alg straw
        hash 0  # rjenkins1
        item osd.10 weight 0.500
        item osd.11 weight 0.500
        item osd.12 weight 1.000
        item osd.13 weight 0.500
        item osd.14 weight 0.500
        item osd.22 weight 0.480
}
host b4 {
        id -6           # do not change unnecessarily
        # weight 3.480
        alg straw
        hash 0  # rjenkins1
        item osd.15 weight 0.500
        item osd.16 weight 1.000
        item osd.17 weight 0.500
        item osd.18 weight 0.500
        item osd.19 weight 0.500
        item osd.23 weight 0.480
}
pool default {
        id -1           # do not change unnecessarily
        # weight 14.620
        alg straw
        hash 0  # rjenkins1
        item b1 weight 2.980
        item b2 weight 4.680
        item b3 weight 3.480
        item b4 weight 3.480
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 2
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule metadata {
        ruleset 1
        type replicated
        min_size 2
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}

# end crush map

Thanks,
Bryan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com