Hi Mark / Jayaram, After running the cluster last night, I noticed lots of "Out Of Memory" errors in /var/log/messages, many of these correlate to dead OSD's. If this is the problem, this might now be another case of the high memory use issues reported in Kraken. e.g. my script logs: Thu 8 Jun 08:26:37 BST 2017 restart OSD 1 and /var/log/messages states... Jun 8 08:26:35 ceph1 kernel: Out of memory: Kill process 7899 (ceph-osd) score 113 or sacrifice child Jun 8 08:26:35 ceph1 kernel: Killed process 7899 (ceph-osd) total-vm:8569516kB, anon-rss:7518836kB, file-rss:0kB, shmem-rss:0kB Jun 8 08:26:36 ceph1 systemd: ceph-osd@1.service: main process exited, code=killed, status=9/KILL Jun 8 08:26:36 ceph1 systemd: Unit ceph-osd@1.service entered failed state. The OSD nodes have 64GB RAM, presumably enough RAM for 10 OSD's doing 4+1 EC ? I've added "bluestore_cache_size = 104857600" to ceph.conf, and am retesting. I will see if OSD problems occur, and report back. As to loading the cluster, I run an rsync job on each node, pulling data from an NFS mounted Isilon. A single node pulls ~200MB/s, with all 7 nodes running, the ceph -w reports between 700 > 1500MB/s writes. as requested, here is my "restart_OSD_and_log-this.sh" script: ************************************************************************ #!/bin/bash # catches single failed OSDs, log and restart while : ; do OSD=`ceph osd tree 2> /dev/null | grep down | \ awk '{ print $3}' | awk -F "." '{print $2 }'` if [ "$OSD" != "" ] ; then DATE=`date` echo $DATE " restart OSD " $OSD >> /root/osd_restart_log echo "OSD" $OSD "is down, restarting.." OSDHOST=`ceph osd find $OSD | grep host | awk -F '"' '{print $4}'` ssh $OSDHOST systemctl restart ceph-osd@$OSD sleep 30 else echo -ne "\r\033[k" echo -ne "all OSD OK" fi sleep 1 done ************************************************************************ thanks again, Jake On 08/06/17 12:08, nokia ceph wrote: > Hello Mark, > > Raised tracker for the issue -- http://tracker.ceph.com/issues/20222 > > Jake can you share the restart_OSD_and_log-this.sh script > > Thanks > Jayaram > > On Wed, Jun 7, 2017 at 9:40 PM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx > <mailto:jog@xxxxxxxxxxxxxxxxx>> wrote: > > Hi Mark & List, > > Unfortunately, even when using yesterdays master version of ceph, > I'm still seeing OSDs go down, same error as before: > > OSD log shows lots of entries like this: > > (osd38) > 2017-06-07 16:48:46.070564 7f90b58c3700 1 heartbeat_map is_healthy > 'tp_osd_tp thread tp_osd_tp' had timed out after 60 > > (osd3) > 2017-06-07 17:01:25.391075 7f62de6c3700 1 heartbeat_map is_healthy > 'tp_osd_tp thread tp_osd_tp' had timed out after 60 > 2017-06-07 17:01:26.276881 7f62dbe86700 -1 osd.3 6165 heartbeat_check: > no reply from 10.1.0.86:6811 <http://10.1.0.86:6811> osd.2 since > back 2017-06-07 17:00:19.640002 > front 2017-06-07 17:01:21.950160 (cutoff 2017-06-07 17:01:06.276881) > > > [root@ceph4 ceph]# ceph -v > ceph version 12.0.2-2399-ge38ca14 > (e38ca14914340d65ea8001c7bd6e0ff769f3eb2e) luminous (dev) > > > I'll continue running the cluster with my "restart_OSD_and_log-this.sh" > workaround... > > thanks again for your help, > > Jake > > On 06/06/17 15:52, Jake Grimmett wrote: > > Hi Mark, > > > > OK, I'll upgrade to the current master and retest... > > > > best, > > > > Jake > > > > On 06/06/17 15:46, Mark Nelson wrote: > >> Hi Jake, > >> > >> I just happened to notice this was on 12.0.3. Would it be > possible to > >> test this out with current master and see if it still is a problem? > >> > >> Mark > >> > >> On 06/06/2017 09:10 AM, Mark Nelson wrote: > >>> Hi Jake, > >>> > >>> Thanks much. I'm guessing at this point this is probably a > bug. Would > >>> you (or nokiauser) mind creating a bug in the tracker with a short > >>> description of what's going on and the collectl sample showing > this is > >>> not IOs backing up on the disk? > >>> > >>> If you want to try it, we have a gdb based wallclock profiler > that might > >>> be interesting to run while it's in the process of timing out. > It tries > >>> to grab 2000 samples from the osd process which typically takes > about 10 > >>> minutes or so. You'll need to either change the number of > samples to be > >>> lower in the python code (maybe like 50-100), or change the > timeout to > >>> be something longer. > >>> > >>> You can find the code here: > >>> > >>> https://github.com/markhpc/gdbprof > <https://github.com/markhpc/gdbprof> > >>> > >>> and invoke it like: > >>> > >>> udo gdb -ex 'set pagination off' -ex 'attach 27962' -ex 'source > >>> ./gdbprof.py' -ex 'profile begin' -ex 'quit' > >>> > >>> where 27962 in this case is the PID of the ceph-osd process. You'll > >>> need gdb with the python bindings and the ceph debug symbols for > it to > >>> work. > >>> > >>> This might tell us over time if the tp_osd_tp processes are just > sitting > >>> on pg::locks. > >>> > >>> Mark > >>> > >>> On 06/06/2017 05:34 AM, Jake Grimmett wrote: > >>>> Hi Mark, > >>>> > >>>> Thanks again for looking into this problem. > >>>> > >>>> I ran the cluster overnight, with a script checking for dead > OSDs every > >>>> second, and restarting them. > >>>> > >>>> 40 OSD failures occurred in 12 hours, some OSDs failed multiple > times, > >>>> (there are 50 OSDs in the EC tier). > >>>> > >>>> Unfortunately, the output of collectl doesn't appear to show any > >>>> increase in disk queue depth and service times before the OSDs die. > >>>> > >>>> I've put a couple of examples of collectl output for the disks > >>>> associated with the OSDs here: > >>>> > >>>> https://hastebin.com/icuvotemot.scala > <https://hastebin.com/icuvotemot.scala> > >>>> > >>>> please let me know if you need more info... > >>>> > >>>> best regards, > >>>> > >>>> Jake > >>>> > >>>> > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com