I recently came across a unit test that looked like this (note that this code has been re-constructed by me to protect the guilty):
I have cut the list short, but there were around 75 such test data items. Crikey!
I can understand why the particular set of test samples was chosen: it was dumped from the production system and chosen to give a representative set of test data.
But this test code was in a unit test. And the behaviour being tested (in this case it was some statistical analysis) didn’t strictly require that many samples. Three samples would have sufficed.
So why is having the additional test data a problem? In general, it’s a productivity issue. Tests with lots of data take longer to write and to work out what the correct answer should be. It makes it harder to maintain. It makes it harder to debug.
It even makes the test class harder to browse because in this case, I needed to scroll past five separate sets of such data before finding the set I was looking for! (No, I don’t like regions, either, before you ask…)
I think the test data for a unit test should be the minimum required to fulfill the test: no more, no less.