Ceph weird "corruption" but no corruption and performance = abysmal.

Florian Rommel <florian.rommel@xxxxxxxxxxxxxxx> · Thu, 21 Apr 2016 15:35:52 +0300

Ok, weird problem,(s) if you want to call it that..

So i run a 10 OSD Ceph cluster on 4 hosts with SSDs (Intel DC3700) as journals.

I have a lot of mixed workloads running and the linux machines seem to get somehow corrupted in a weird way and the performance kind of sucks.
First off:
All hosts are running Openstack with KVM + libvirt to connect and boot the RBD volumes.
Ceph -v : ceph version 0.94.6

—————— Problem 1: Corruption:

Next, whenever I run fsck.ext4 -nvf /dev/vda1 on one of the guests I get this:
2fsck 1.42.9 (4-Feb-2014)
Warning!  /dev/vda1 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 1647 has zero dtime.  Fix? no

Inodes that were part of a corrupted orphan linked list found.  Fix? no

Inode 133469 was part of the orphaned inode list.  IGNORED.
Inode 133485 was part of the orphaned inode list.  IGNORED.
Inode 133490 was part of the orphaned inode list.  IGNORED.
Inode 133492 was part of the orphaned inode list.  IGNORED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (8866035, counted=8865735).
Fix? no

Inode bitmap differences:  -1647 -133469 -133485 -133490 -133492
Fix? no

Free inodes count wrong (2508840, counted=2509091).
Fix? no

cloudimg-rootfs: ********** WARNING: Filesystem still has errors **********

      112600 inodes used (4.30%, out of 2621440)
          70 non-contiguous files (0.1%)
          77 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 104372/41
     1619469 blocks used (15.44%, out of 10485504)
           0 bad blocks
           2 large files

       89034 regular files
       14945 directories
          55 character device files
          25 block device files
           1 fifo
          16 links
        8265 symbolic links (7832 fast symbolic links)
          10 sockets
------------
      112351 files

So I mount the disk via RBD on a host directly with rbd map
and when i do a fsck.ext4 -nfv /dev/rbd01p1 
i get 

fsck.ext4 /dev/rbd0p1
e2fsck 1.42.11 (09-Jul-2014)
cloudimg-rootfs: clean, 112600/2621440 files, 1619469/10485504 blocks

So which one do I trust??? I have had corrupted files on some of the images but I accredited this due to a migration from qcow2 to RAW -> ceph.

Any help is really appreciated

———— > Problem 2: Performance
I would assume that even with the Intel DC SSDs as journals, I would get decent performance out of the system. But currently I max this one out at 200MB/s write while read is full 10Gbit/s

I have 10 SATA drives behind the SSDs using 2x 3 SATAs/SSD and 2x 2 SATA / SSD

fio is also giving terrible results, its like it cranks up the IO to about 5000 then dwindles down.. looks almost like its waiting to flush the SSDs out.. or the IO

The only changes i made to the base config is rbd cache = true and then the following lines:

ceph tell osd.* injectargs '--filestore_wbthrottle_enable=false'
ceph tell osd.* injectargs '--filestore_queue_max_bytes=1048576000'
ceph tell osd.* injectargs '--filestore_queue_committing_max_ops=5000'
ceph tell osd.* injectargs '--filestore_queue_committing_max_bytes=1048576000'
ceph tell osd.* injectargs '--filestore_queue_max_ops=200'
ceph tell osd.* injectargs '--journal_max_write_entries=1000'
ceph tell osd.* injectargs '--journal_queue_max_ops=3000’

Thats the only way I reached 200-250MB/s.. otherwise its more like 115MB/s also waiting for flush after a wave..

Can anyone give me a fairly decent idea on how to tune this properly? also could this modification have something to do with the corruption?

Thanks again for any help :)

//Florian
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com