Improved testing of Singer taps and targets

Dec 8, 2023 · 1510 words · 8 minute read

By now, I maintain quite a few Singer taps and targets, created with Meltano’s singer_sdk library. For those unfamiliar with Singer, it is a framework from moving data from a source system to a target system via a standardized communication protocol. The singer_sdk library contains a bunch of sweet abstractions that make this a lot easier.

Some of the things I maintain:

target-oracle for writing to Oracle databases¹
target-mssql for Writing to SQL Server databases²
tap-prefect for reading from the Prefect REST API³
tap-pxwebapi for reading statistics from Statistics Norway/Sweden/Finland via a REST API.⁴

By their nature, these taps and targets connect to outside services, and I need to think carefully about testing.

For target-oracle and target-mssql I already have a pretty sweet set of tests, including a docker-compose YAML file so that I can spin up a test database to write to. Because Github Actions have a really cool feature called services which lets you spin up databases and more as part of your flow, these kinds of tests are great for my use case. For target-mssql, it is already implemented.

For the taps, I currently don’t have any meaningful tests. I definetely have to remedy that.

Developing a testing strategy 🔗

SQL Targets 🔗

Although testing against the SQL targets works nicely, there are some things that I want to improve. Some low-hanging fruit:

Most of the tests are just the same function, taking a different file as input. The file represents the stream of data following the Singer standard. There are a lot of different files for different scenarios, for the most part these are just passed in to the same function. It might make sense to parameterize these. Although, as we shall see, it also might not.
The current tests don’t actually look at what is created, it just tests that it doesn’t fail. In my experience, this covers 95% of the types of failures that occur, but I want to create a very lightweight great-expextations’ish test suite to help - because I have seen tests pass, but writing no rows or nothing but NULLs to the database.
There are also different configurations to these targets, things like how to handle floating point numbers and handling special characters in table- and column names. Ideally, we want to run the same tests with different configuration. It seems like this can be done by adding several parameter decorators. I am afraid this could cause different test configurations to write to the same target, but I might be able to get around that in my case.

REST taps 🔗

Since there are no meaningful existing tests on these taps, there are many simple things that will help a lot.

We don’t want to (and often can’t) reach out to these REST endpoints when testing. But we should be able to mock a REST response and observe what output it produces.
Because we don’t want to hit the API more than necessary, we might want to check that the mocks aren’t called more than a given number of times.

Fundamentally, Singer taps and targets are command line utilities, and it would be possible to create end-to-end consisting of a command line call and testing the stdout (and… logs?). For taps you might argue these tests would be comprehensive because the output it produces is to stdout, for targets the main output is to databases but testing that the input corresponds to what ends up in the database is a big goal.

Implementation 🔗

Parameterization 🔗

Parameterizing some of the more repetitive tests went fairly easy. These take a filename as input, and simply processes that file. So we can parameterize it to give the filenames as a list, and we save many many lines of code.

These tests can be split into three categories: Tests we want to pass, tests we want to fail, and tests we want to skip. I didn’t see a good way to parameterize this part, so I created two different functions, one decorated with xfail and one not. So far, I have opted for not doing the same for the skipped tests.

Why include skipped tests? Because we develop within a specified paradigm, use an SDK and try to implement a number of features that may be expected by users, creating tests but skipping them made sense as a way to delineate and keep track of functionality that are on our todo-list.

Both for skipped and failed tests, the reason parameters can be very valuable. Parameterizing the tests removes the ability to give a reason like this. Maybe there is a way, but so far I haven’t come across one.

update: I suddenly remembered (by googling) that I can pass pytest.param as parameter, and specify mmarks. This means I can use a single function, while keeping my reason.

pytest.param("invalid_config.singer", 
    marks=pytest.mark.xfail(
        reason="Tap should fail if config is invalid"
    )
)

Check the table contents 🔗

The problem with our parameterization is that it becomes more difficult to check for consistency between the input and the final table. When all we have is an input file name, we don’t even know what table it will end up in - even worse, the table name might depend on my configuration, so I can’t just sneak the table name into the file name or something like that.

Our first and perhaps simplest option is to add the expected table name to the test parameters. This lets us check that the table exists, and we can check that it has rows.

We can, of course, also write non-parameterized tests for the times we want to really check that the content is what it is supposed to be.

There are two final options, which might not be good ideas:

Add parameters for what to check for in the target table. This will probably involve writing a whole mini-language to convey, so there would be a lot of yak-shaving involved.
You might have a set of tests that were guaranteed to run after the parameterized tests ran (and perhaps only if they ran successfully). Once everything is in place, we can independently of the first tests run a whole series of tests to make sure that row counts, data types, cell contents etc are as we would expect. This idea seems tempting, but I don’t know if it is possible and I don’t know if it is smart.

I haven’t decided on any of this yet. Updates to come.

Code 🔗

I always find it hard to understand abstract descriptions of testing architecture, so here is a mini-version of the code and the changes I have been writing about.

Right now, the only implementation I have started on is to parameterize the tests for the SQL targets.

Original 🔗

Originally, tests were quite simple and separated into different functions.

@pytest.fixture()
def oracle_config():
    return {
        "schema": "SYSTEM",
        "user": "SYSTEM",
        "password": "P@55w0rd",
        "host": "localhost",
        "port": "1521",
        "database": "XE",
        "prefer_float_over_numeric": False,
        "freeze_schema": True
    }


@pytest.fixture
def oracle_target(oracle_config) -> TargetOracle:
    return TargetOracle(config=oracle_config)


def singer_file_to_target(file_name, target) -> None:
    """Singer file to Target, emulates a tap run """
    file_path = Path(__file__).parent / Path("./data_files") / Path(file_name)
    buf = io.StringIO()
    with redirect_stdout(buf):
        with open(file_path, "r") as f:
            for line in f:
                print(line.rstrip("\r\n"))  # File endings are here,
                # and print adds another line ending so we need to remove one.
    buf.seek(0)
    target.listen(buf)


def test_schema_updates(oracle_target):
    file_name = "schema_updates.singer"
    singer_file_to_target(file_name, oracle_target)

## ... many more similar tests ...

def test_multiple_state_messages(oracle_target):
    file_name = "multiple_state_messages.singer"
    singer_file_to_target(file_name, oracle_target)

This includes a bit of repeated code, test_schema_updates and test_multiple_state_messages differ only in the data file it picks up. It isn’t all that much code though, but remember that there are at current 25 such files.

Updated 🔗

We have concatinated two fixtures into one, and converted the singer_file_to_target function to a test (test_singer_file_to_target) with input parameters. This allowed us to replace most of the tests like test_schema_updates with an input parameter to the new test.

Not shown here, we also have a function called test_singer_file_to_target_xfail that is decodated with @pytest.mark.xfail.


@pytest.fixture
def oracle_target() -> TargetOracle:
  oracle_config = {
        "schema": "SYSTEM",
        "user": "SYSTEM",
        "password": "P@55w0rd",
        "host": "localhost",
        "port": "1521",
        "database": "XE",
        "prefer_float_over_numeric": False,
        "freeze_schema": True
    }

    return TargetOracle(config=oracle_config)


@pytest.mark.parametrize(
    "file_list",
    [
        ["schema_updates.singer"],
        ["multiple_state_messages.singer"]
    ],
)
def test_singer_file_to_target(file_list, mssql_target) -> None:
    """Singer file to Target, emulates a tap run"""
    for file_name in file_list:
        file_path = Path(__file__).parent / Path("./data_files") / Path(file_name)
        buf = io.StringIO()
        with redirect_stdout(buf):
            with open(file_path, "r") as f:
                for line in f:
                    print(line.rstrip("\r\n"))  # File endings are here,
                    # and print adds another line ending so we need to remove one.
        buf.seek(0)
        mssql_target.listen(buf)

By using arrays as inputs, we will be able to test that sequences of inputs are treated correctly, important for when testing things like updates, where we first need to insert data followed by an update.

Going further 🔗

There are plenty of things yet to do, this was just cleaning up, not adding better tests.

There are also further issues I want to fix. One is to have a suite of tests that don’t depend on the external databases - in addition to the existing ones, of course.