On 07/07/2009 19:38, Mickey Mazarick wrote:
Since I'm running my setup as a storage farm it just doesn't matter to me if there's a memory leak of if a server daemon crashes, I have cron jobs that restart it and I barely take notice.
Ouch, ouch, ouch. That sounds like a monumental bodge. If somebody working for me implemented that kind of a "solution" for a frequently occuring problem in a production environment, they'd be finding themselves looking for a new job pretty quickly. Most likely along with the architect who trialed the solution before putting it into production without finding the problems that require such a solution. Solution to crashing processes is fixing the bug that causes them to crash, not a wrapper that gets them restarted.
True a regression testing would get rid of the memory leak you hate but if they have to start from the ground up I would rather encourage the dev team to add hotadd upgrade and hotadd features. These things would keep my cluster going even if there were catastrophic problems.
The _LAST_ thing Gluster needs at the moment is more features. Lack of stability loses you customers much faster than extra features gain them.
What I'm saying is that a good top down testing system is something we can discuss here, spec out and perhaps create independently of the development team. I think what most people want is a more stable product and I think a top down approach will get it there faster than trying to implement a given UT system from the bottom up. It will defiantly answer the question "should I upgrade to this release?"
IMO, a top down approach merely glazes over the more fundamental problems. You cannot engineer quality from the top down. You design from top down, but quality comes from bottom up.
Gordan