Do we really trust our tests? If we have a test suite, and most run fine but sometimes, some tests fail for predictable or unpredictable reasons... should we still keep them in the suite? Here is a scenario. We have a test suite of almost 5000 tests, some of which are integration tests, some functional tests, and a lot of unit tests. They are organized into a set of lets say 25 assemblies total. Many teams (lets say 10) are working on the code base. Sometimes some of the tests that were written depending on data in the database fail, because the database is shared between all the 10 teams AND the CI build server. Sometimes a test or a few will fail, breaking the build. Most of the time, just re-initiating the CI build process will result in success the second time. So do we trust these tests? Do we believe in their results?
When using TDD, we first test the test itself, making sure it fails when we expect it to, and passes when we expect it to. Tests must set up and tear down anything and everything (including data in a database for integration tests) in order to be reliable. If we have legacy tests that weren't written this way, are they valid? They certainly don't meet the standard of TDD in my view.
Lets look at the reason we write tests at all...
that's it... in a nutshell.
If a test runs successfully sometimes but not others, does it really mitigate risk? Does it mitigate anything at all if we don't trust its result? I think NO. Are we better off leaving these tests as part of the suite? or are we better off just deleting them from the codebase - or... fixing them so that they are reliable. If they mitigate risk when they work, and they test a portion of the code that is valuable to ensure is working correctly, then they have to be fixed. Otherwise, they should just be removed from the test suite.
We depend on our test suites to tell us the state of the state for our working software. If they can't do this, then I would argue they are of no value to us in the overall big picture. Ensure that all tests are valuable, valid, and trustworthy. Make sure that things like time of day, data in a database (or lack thereof), or other transient factors aren't able to influence the outcome of the test. Only when we can rely on our test coverage to tell us the Real state of the code, can we deliver reliable, working software.