fbpx

Unit Test Your Data Pipeline, You Will Thank Yourself Later

By Norman Niemer - Chief Data Scientist at a large asset manager where he delivers data-driven investment insights.
LinkedIn
Twitter
Facebook

While you cannot test model output, at least you should test that inputs are correct. Compared to the time you invest in writing unit tests, good pieces of simple tests will save you much more time later, especially when working on large projects or big data.

One common mistake that data scientists, especially beginners, make is not writing unit tests. Data scientists sometimes argue that unit testing is not applicable because there is no correct answer to a model that can be known ahead of time or to test with. However, most data science projects start with data transformation. While you cannot test model output, at least you should test that inputs are correct. Compared to the time you invest in writing unit tests, good pieces of simple tests will save you much more time later, especially when working on large projects or big data.

If you would like to contribute to our blog, or have something to say please get in touch.

Latest Jobs