>>>>> "Oleg" == Oleg Cherkasov <o1e9@member.fsf.org> writes: Oleg> On 21. okt. 2017 04:55, Mike Snitzer wrote: >> On Thu, Oct 19 2017 at 5:59pm -0400, >> Oleg Cherkasov <o1e9@member.fsf.org> wrote: >> >>> On 19. okt. 2017 21:09, John Stoffel wrote: >>>> >> >> So aside from SAR outout: you don't have any system logs? Or a vmcore >> of the system (assuming it crashed?) -- in it you could access the >> kernel log (via 'log' command in crash utility. Oleg> Unfortunately no logs. I have tried to see if I may recover dmesg Oleg> however no luck. All logs but the latest dmesg boot are zeroed. Of Oleg> course there are messages, secure and others however I do not see any Oleg> valuable information there. Oleg> System did not crash, OOM were going wind however I did manage to Oleg> Ctrl-Alt-Del from the main console via iLO so eventually it rebooted Oleg> with clean disk umount. Bummers. Maybe you can setup a syslog server to use to log verbose kernel logs elsewhere, including the OOM messages? >> >> More specifics on the workload would be useful. Also, more details on >> the LVM cache configuration (block size? writethrough or writeback? >> etc). Oleg> No extra params but specifying mode writethrough initially. Oleg> Hardware RAID1 on cache disk is 64k and on main array hardware Oleg> RAID5 128k. Oleg> I had followed precisely documentation from RHEL doc site so lvcreate, Oleg> lvconvert to update type and then lvconvert to add cache. Oleg> I have decided to try writeback after and shifted cachemode to it with Oleg> lvcache. >> I'll be looking very closely for any sign of memory leaks (both with >> code inspection and testing while kemmleak is enabled). >> >> But the more info you can provide on the workload the better. Oleg> According to SAR there are no records about 20min before I reboot, so I Oleg> suspect SAR daemon failed a victim of OOM. Maybe if you could take a snapshot of all the processes on the system before you run the test, and then also run 'vmstat 1' to a log file while running the test? As a wierd thought... maybe it's because you have a 1gb meta data LV that's causing problems? Maybe you need to just accept the default size? It might also be instructive to make the cache be just half the SSD in size and see if that helps. It *might* be that as other people have mentioned, that your SSD's performance drops off a cliff when it's mostly full. So reducing the cache size, even to only 80% of the size of the disk, might give it enough spare empty blocks to stay performant? John _______________________________________________ linux-lvm mailing list linux-lvm@redhat.com https://www.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/