> I mean if it's possible for a test case to just fail when hitting some > big performance regression. > > E.g. one operation should finish in 30s, but when it takes over 300s, > it's definitely a big regression. > > But considering how many different hardware/VM the test may be run on, > I'm not really confident if this is possible. It is a lot of work, but one-setup-agnostic way of doing this is the following: - Run test with some older version that is known to be correct/stable, measure time - Run test with latest version, measure time - If newer version is 2x (or some configurable threshold) slower than old version, warn developer Since both old and new versions are being run on the same setup, we don't have to worry about hardware or setup differences.