Update: I noticed that there was a pg that remained scrubbing from the first day I found the issue to when I reboot the node and problem disappeared. Can this cause the behaviour I described before? > Il giorno 09 nov 2017, alle ore 15:55, Matteo Dacrema <mdacrema@xxxxxxxx> ha scritto: > > Hi all, > > I’ve experienced a strange issue with my cluster. > The cluster is composed by 10 HDDs nodes with 20 nodes + 4 journal each plus 4 SSDs nodes with 5 SSDs each. > All the nodes are behind 3 monitors and 2 different crush maps. > All the cluster is on 10.2.7 > > About 20 days ago I started to notice that long backups hangs with "task jbd2/vdc1-8:555 blocked for more than 120 seconds” on the HDD crush map. > About few days ago another VM start to have high iowait without doing iops also on the HDD crush map. > > Today about a hundreds VMs wasn’t able to read/write from many volumes all of them on HDD crush map. Ceph health was ok and no significant log entries were found. > Not all the VMs experienced this problem and in the meanwhile the iops on the journal and HDDs was very low even if I was able to do significant iops on the working VMs. > > After two hours of debug I decided to reboot one of the OSD nodes and the cluster start to respond again. Now the OSD node is back in the cluster and the problem is disappeared. > > Can someone help me to understand what happened? > I see strange entries in the log files like: > > accept replacing existing (lossy) channel (new one lossy=1) > fault with nothing to send, going to standby > leveldb manual compact > > I can share all the logs that can help to identify the issue. > > Thank you. > Regards, > > Matteo > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- > Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto. > Seguire il link qui sotto per segnalarlo come spam: > http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=12EAC4481A.A6F60 > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com