Hi Ceph Users, I've been having some issues with Bluestore on Lumonous (12.2.8 and 12.2.10) as well as Mimic (13.2.4) and thought maybe you guys could give me some insight as to what the cause of my problems may be. For Context: I am tasked with evaluation Bluestore as a Replacement for the soon to be ruled out Filestore OSD's. I was given a Dell R740xd (Intel(R) Xeon(R) Gold 6130, 92 Gb DDR4) with 12x8TB HGST (Ultrastar HE10) Sata disks for testing. Mon, Mgr and OSD's are all running on this one server. I used the default configuration values if not otherwise stated. The OSD's are all formatted with dmcrypt enabled. First problem (Luminous 12.2.10): After having issues with the Memory consumption on 12.2.8, exceeding 8GB per OSD with 1GB of cache configured, I wanted to see if the new memory target feature was any help and it was. However after leaving the system idling for a Weekend, I noticed on our Grafana that some OSD'S were continuously allocating more and more Memory and then suddenly dropping back to the configured 2GB memory target. Unfortunately this was only recorded so I couldn't see what may have caused this while it was happening and I could't get it replicated. I've attached a screenshot I made of this issue (https://pasteboard.co/I2cs3me.png). The metrics from this graph are gathered with a small python script from ceph daemon osd.X dump_mempools. Second problem (Mimic 13.2.4): After Upgrading to Mimic I noticed that the "Cluster" isn't able to finish snaptrim in the 24H it has time until the next one is due. It seems that one OSD is taking literally all day to delete the snapshot data wich is odd since all of the disks are new and the iostat stats aren't out of the ordinary so the disk isn't saturated or otherwise overloaded. Changing around parameters didn't help. Maybe I am missing some feature that is throttling the snaptrim process since there is a script in place to simulate our live environment load on the system. I've attached a screenshot of the pg states (https://pasteboard.co/I2crFnl.png). The bigger issue here is that while snaptrim we get some blocked requests, especially when all pg's are in snaptrim or snaptrim_wait wich seems to me like something that shouldn't be. Migration problem: In order for us to see how we might migrate our live environment to Bluestore, I ran Filestore and Bluestore mixed on 12.2.8 and ran into some rather unpleasant outage scenarios when under load. In particular OSD's beeing marked down when both Bluestore and Filestore are being written to, since Filestore take a bit longer to write it's data and thus the Bluestore OSD marks it wrongly as down. Are there any recommendations as to how to migrate without such issues? I hope I got my points across, if there is any lack of clarity feel free to ask. Thanks in advance. Regards, Johannes Liebl Johannes Liebl Junior Linux Systemadministrator Storage Services 1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135 Karlsruhe | Germany Phone: +49 721 91374-8492 E-Mail: johannes.liebl@xxxxxxxx | Web: www.1und1.de Amtsgericht Montabaur, HRB 5452 Geschäftsführer: Thomas Ludwig, Jan Oetjen, Sascha Vollmer Member of United Internet Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient of this e-mail, you are hereby notified that saving, distribution or use of the content of this e-mail in any way is prohibited. If you have received this e-mail in error, please notify the sender and delete the e-mail. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com