5 Lessons Learned From Running Automated Tests on a Data Warehouse
Written by Markus Ehrenmüller-Jensen, Data Engineer
“Testing? A database?” you might ask. “Yes! We do!” we answer. There is a lot of code directly inside a database (functions & stored procedures) and in the implementation of the ETL, which all can break or have unintended impacts. Furthermore, definitions or access rights to database objects (e.g. a table) might change over time, which will have similar consequences, like changing code. And of course, there might be issues with the data itself.
Data is, for sure, the new oil. Data is a valuable asset for organizations. Capturing (in the data warehouse’s data source) and transforming data (during ETL) should keep the quality of the data as high as possible. We want to be sure that we only digest high quality data into the data warehouse. And we want to be sure not to “harm” the data through various steps, necessary to squeeze out the information, which will later help the end users to gain insights from it.
Stick to the following five lessons we learned to get you started with testing databases, as well.
1. Make The Start Not Too Tough
All beginnings can be tough. If you are not used to writing tests for a database or a data warehouse, choose parts that are easy to start with. In most cases this will contain simple unit-tests of functions and procedures. Like this:
DECLARE @success bit = 1; IF ( select count(*) FROM udf_SplitText (N'Hello World!', ' ') ) <> 2 SET @success = 0; EXEC tSQLt.AssertEquals 1, @success;
A test like this can easily be written without any special framework. As you can see in the very last line of the example above: we decided to use tSQLt for our data warehouse, which is an open source database unit testing framework for Microsoft SQL Server.
The more you get used to testing as a work item and the more you get used to your chosen test framework, the more complex tests you can and should write. Later you should start implementing (new) tests for existing stuff, based on the following criteria:
- feature is business critical
- feature is rather complex
2. Tests Should Generate Chatty Reports
Each test should include a short description as a comment, so that every data engineer who wants to have a look at a failed test knows what the test is about without having to decipher the code. Like this:
EXEC tSQLt.Fail 'Table ', @database, '.', @TableSchema, '.', @TabelName, ' has not been updated today. Please check today''s ETL log in table DataloadHistory.';
Even better is when the alert of a failed test has a clear description of what problem was discovered and which steps are to be taken to solve it. I clearly advocate “chatty tests,” which tell the whole truth about the problem, so we can immediately see all available details of the problem. This makes your life much easier than having to spend extra effort collecting necessary details. In contrast, when all tests run through without any issues, I don’t want to hear from the test framework at all.
3. Tests Are A Crucial Part Of Writing Code
In the long run, writing tests should become a natural part of writing code. To be more specific: Writing tests should be the first step before you write any line of code, so that new features always come with tests in place. Make test specification part of the user story and add “all tests are written” to your “definition of done.” By including the test definition in the description of the user story, you can avoid ambiguity as well. In the best case, the user story already lists all (special) cases and describes the behavior of the needed piece of code.
You should develop a habit of executing the test(s) after every change to the code. You should definitely not hand the responsibility of executing tests over to a third person (e.g. Quality Assurance department).
4. Automation is King
On a busy day it could happen that someone forgets about executing all necessary tests. That’s why I advocate automation. In the best case, the test will run automatically, event driven after each check-in to your source control or at least based on a schedule multiple times a day. The earlier a developer finds out about problems in the code they wrote, the easier it will be for them to correct the mistakes. Automation will guarantee that testing will be as convenient, as fast, and as reliable as possible.
5. Specify a Test Concept
At Runtastic we came up with the idea of having different types of test categories, which I describe here:
- unit test
- load test
- anomaly detection
- dev test
While unit tests concentrate solely on the code (based on always-the-same test data), the other three tests (load tests, anomaly detection, and dev tests) do check for issues in and with live data as well.
Each test type is executed via a dedicated procedure, which will be either called during the daily load or based on a schedule, as described below. Each type of test might be triggered from a different event.
We write (classical) unit tests for functions and stored procedures in the database. Basically they are easy to implement and are a simple check if a piece of code is (still) working as expected. We have already seen an example above. To avoid manual calls of the tests we run all unit tests automatically on a schedule several times a day.
Unit tests are fine for testing particular sections of the data warehouse. But don’t forget to also use a holistic approach to find out if everything from the data source ends up in the right quality and quantity and at the right time in the end user’s reports and dashboards.
The actual checks for the ETL are two-fold: on one hand, we have unit tests in place, which automatically set up a special environment, call the ETL package, and then check if everything is ok. On the other hand, we do what we call “load tests” and “anomaly detection.”
Load tests are slightly different from unit tests. Instead of calling the actual ETL package inside a test environment, they analyze the (daily freshly loaded) live data and compare it to the data in the data source. So they test not only the piece of code, but the data as well. Load tests run as part of our ETL. Like this:
DECLARE @count int; SELECT @count = COUNT(*) FROM dbo.DimApplication WHERE DimApplication.Type is null EXEC tSQLt.AssertEquals @count, 0;
The goal is to discover issues we could have with “live” data due to changes in the source system. Such problems could be caused by completely new data, which was not foreseen when the feature was implemented. If one of those tests fails, we are alerted, as this clearly indicates a bug. We either have to then fix the data warehouse or the source system.
Anomaly detection is not a test in a classic sense. As you can guess from the name, its function is more like a watchdog, which looks for issues with the data we receive from the data source and transfer into the data warehouse. An alert from this kind of test might indicate a true issue but could also just mean that the threshold for what we discovered as an anomaly was defined too narrow (false positive).
Let’s give you an example. On average we receive more or less 100k rows from a certain data source. If we one day receive only 80k rows, this could be an indicator of a problem. We could have an issue, because we did not receive all of the data. But that is not certain. On some days it could be actually be ok to only receive 80k rows, even when we receive +/- 100k on others. And this is what we try to tackle with anomaly detection. Breaking down the data into batches and looking for anomalies, while still allowing for certain thresholds. Those thresholds can be as simple as a percentage of the average of the past 30 days or be as complex as a machine learning algorithm. Like this:
DECLARE @count int; SELECT @count = AVG(*) FROM dbo.FactSubscription WHERE FactSubscription.CreateDate < GETDATE() AND FactSubscription.CreateDate >= DATEADD(day, -30, GETDATE()) EXEC tSQLt.AssertEquals @count, 0;
We trigger anomaly detection automatically during ETL. As some or many of those tests are dependent on the actual available data, and not only on the programmatic logic alone, we run all of those three test categories on all data warehouse environments, even on production.
Dev Tests run in development environment only. We came up with the idea of dedicated tests for the development environment, as we sometimes face the fact that new functionality in the data source is implemented in parallel with the integration into the data warehouse of exactly this data. These tests may fail often and maybe over a longer period of time. Alerts coming from these tests are of interest for the particular developer who sees how far along implementations in the data source are but can be ignored by anybody not directly involved in this project. Fails do not mean that something is seriously damaged, but rather that work is still in progress. As we still want to come up with tests early, we decided to separate those tests from the rest. Dev Tests will eventually become unit or load tests, or the basis for anomaly detection, as part of the deployment. Like unit tests, these tests are automatically scheduled multiple times a day.
In addition to the four test categories, we implemented a WatchDog, which checks (as the very last step of ETL) if each and every available test was indeed executed at least once during the past 24 hours. If not, an email alert is sent out. This avoids situations where we feel relaxed because we aren’t receiving any alerts from failed tests, due to the fact that the tests are not running at all. And no, we did not come up with this idea out of thin air; we faced a similar problem during ETL from which we learned that trust in our jobs is good, but verification is king!
Call to Action
What is your experience with (not) testing a data warehouse? What problems have you faced and how did you overcome them? I’d love to discuss with you – just leave a comment below.