Re: ceph bluestore RAM over used - luminous

Benoit GEORGELIN - yulPa <benoit.georgelin@xxxxxxxx> · Mon, 15 May 2017 02:12:26 +0200 (CEST)

----- Mail original -----
> De: "Benoit GEORGELIN" <benoit.georgelin@xxxxxxxx>
> À: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Envoyé: Samedi 13 Mai 2017 19:57:41
> Objet:  ceph bluestore RAM over used - luminous

> Hi dear members of the list,
> 
> I'm discovering CEPH and doing some testing.
> I came across a strange behavior about the RAM used by OSD process.
> 
> Configuration :
> ceph version 12.0.2
> 3xOSD nodes , 2 OSD by nodes, total 6 OSD and 6 Disks
> 4Vcpu
> 6Go de ram
> 64 PGS
> Ubuntu 16.04
> 
> From the documentation, 500Mb/1Gb is enough by OSD , but in my case, I don't
> really understand why, the OSDs consume a lot of RAM
> Sometimes, I can see, the RAM is released, from 4.5Go to 1.5Go but most of the
> time, it goes until swap and OSD crash :/
> The more OSD the quicker it crash.. of course. I'm using an RBD image (100Go)
> mounted on a client with RBD map
> With only 2 OSD by nodes I can crash it in less than few iteration of :
> 
> 
> _TESTPATH="/rbd/ceph-sda1"
> _BS="300M"
> _COUNT="1"
> _OFLAG="direct"
> echo "####------------ `hostname -f` ------------###"; echo "###CMD: dd
> if=/dev/zero of=${_TESTPATH}/testperf${NUM} bs=${_BS} count=${_COUNT}
> oflag=${_OFLAG}"; echo "###TestPath: ${_TESTPATH}"; echo ""; for NUM in `seq 1
> 10`; do dd if=/dev/zero of=${_TESTPATH}/testperf${NUM} bs=${_BS}
> count=${_COUNT} oflag=${_OFLAG} && rm ${_TESTPATH}/testperf${NUM}; done 2>&1 |
> grep copi|sort -n ;
> 
> Witch is 10 times in a row :
> dd if=/dev/zero of=/rbd/ceph-sda1/testperf bs=300M count=1 oflag=direct
> 
> 
> Here is my ceph.conf :
> 
> 
> ### begin
> [global]
> fsid = 2d892cb4-7992-485c-b4e0-2242fa508461
> mon_initial_members = int-ceph-mon1a-fr
> mon_host = 10.101.240.137
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> 
> public network = 10.101.240.0/24
> cluster network = 10.101.0.0/24
> enable experimental unrecoverable data corrupting features = bluestore rocksdb
> 
> #bluestore_debug_omit_block_device_write = true
> 
> [client]
> rbd cache = true
> rbd cache size = 67108864 # (64MB)
> rbd cache max dirty = 50331648 # (48MB)
> rbd cache target dirty = 33554432 # (32MB)
> rbd cache max dirty age = 2
> rbd cache writethrough until flush = true
> 
> [osd]
> #Choose reasonable numbers for number of replicas and placement groups.
> osd pool default size = 3 # Write an object 2 times
> osd pool default min size = 2 # Allow writing 1 copy in a degraded state
> osd pool default pg num = 64
> osd pool default pgp num = 64
> 
> 
> debug osd = 0
> debug bluestore = 0
> debug bluefs = 0
> debug rocksdb = 0
> debug bdev = 0
> bluestore = true
> osd objectstore = bluestore
> #bluestore fsck on mount = true
> bluestore block create = true
> bluestore block db size = 67108864
> bluestore block db create = true
> bluestore block wal size = 134217728
> bluestore block wal create = true
> 
> #osd journal size = 10000 default is to use all the device if not set
> 
> [osd.0]
>        host = int-ceph-osd1a-fr
>        public addr = 10.101.240.140
>        cluster addr = 10.101.0.140
>        osd data = /var/lib/ceph/osd/ceph-0/
> 
> [osd.1]
>        host = int-ceph-osd1a-fr
>        public addr = 10.101.240.140
>        cluster addr = 10.101.0.140
>        osd data = /var/lib/ceph/osd/ceph-1/
> 
> 
> [osd.2]
>        host = int-ceph-osd1b-fr
>        public addr = 10.101.240.141
>        cluster addr = 10.101.0.141
>        osd data = /var/lib/ceph/osd/ceph-2/
> 
> [osd.3]
>        host = int-ceph-osd1b-fr
>        public addr = 10.101.240.141
>        cluster addr = 10.101.0.141
>        osd data = /var/lib/ceph/osd/ceph-3/
> 
> [osd.4]
>        host = int-ceph-osd1c-fr
>        public addr = 10.101.240.142
>        cluster addr = 10.101.0.142
>        osd data = /var/lib/ceph/osd/ceph-4/
> 
> [osd.5]
>        host = int-ceph-osd1c-fr
>        public addr = 10.101.240.142
>        cluster addr = 10.101.0.142
>        osd data = /var/lib/ceph/osd/ceph-5/
> 
> #### END
> 
> 
> Can you tell me if you see anything wrong here ?
> Is ceph supposed to "clear" the RAM before more quickly than what it does on my
> systems ?
> From what I see, most of the time, if does not free the RAM and crash my OSD
> 
> Thanks a lot for your time and your help.
> 
> Regards,
> 
> 
> 
> Benoît G,
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This is the errors I get in OSD :

     0> 2017-05-14 23:55:01.910834 7f0338157700 -1 /build/ceph-12.0.2/src/os/bluestore/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f0338157700 time 2017-05-14 23:55:01.907393
/build/ceph-12.0.2/src/os/bluestore/KernelDevice.cc: 364: FAILED assert(r >= 0)

 ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55b37a657072]
 2: (KernelDevice::_aio_thread()+0x1301) [0x55b37a5dcc61]
 3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x55b37a5df45d]
 4: (()+0x76ba) [0x7f034262e6ba]
 5: (clone()+0x6d) [0x7f03416a582d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
[...]
2017-05-14 23:55:01.963223 7f0338157700 -1 *** Caught signal (Aborted) **
 in thread 7f0338157700 thread_name:bstore_aio

 ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
 1: (()+0xcab9b2) [0x55b37a5f39b2]
 2: (()+0x11390) [0x7f0342638390]
 3: (gsignal()+0x38) [0x7f03415d4428]
 4: (abort()+0x16a) [0x7f03415d602a]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x55b37a6571fe]
 6: (KernelDevice::_aio_thread()+0x1301) [0x55b37a5dcc61]
 7: (KernelDevice::AioCompletionThread::entry()+0xd) [0x55b37a5df45d]
 8: (()+0x76ba) [0x7f034262e6ba]
 9: (clone()+0x6d) [0x7f03416a582d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

No ideas about this problem about RAM usage ? I guess the errors are because of the ram totally consumed .. After that error i'm not able to recover anything.. So I you consider this as a bug, I'll open issue. 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com