avatar

Radbrt

Improved testing of Singer taps and targets

By now, I maintain quite a few Singer taps and targets, created with Meltano’s singer_sdk library. For those unfamiliar with Singer, it is a framework from moving data from a source system to a target system via a standardized communication protocol. The singer_sdk library contains a bunch of sweet abstractions that make this a lot easier. Some of the things I maintain: target-oracle for writing to Oracle databases1 target-mssql for Writing to SQL Server databases2 tap-prefect for reading from the Prefect REST API3 tap-pxwebapi for reading statistics from Statistics Norway/Sweden/Finland via a REST API.

A eulogy for Meltano Cloud

The beginning and the end 🔗The days of the modern data stack were waning. Interest rates were soaring. And the appetite for Yet Another SaaS was plummeting among both companies and investors. Meltano Cloud entered public Beta behind everyone else, and behind their own schedule. And it disappeared before anyone else. The Meltano team is now working on Arch, a new adventure for similar but different use cases. Perhaps Meltano Cloud was too late to market.

tap-pxwebapi

A Singer Tap for Official Statistics 🔗In the world of data engineering, Singer is popular standard, with tools like Airbyte and Meltano providing a flexible framework for data loading. One source that is often overlooked for data loading needs however, is official statistics. Different statistical offices around the world have different APIs (and in some cases no API at all), but one place to start is PxWeb. PxWeb is a common thread connecting Norway, Sweden, and Finland in the realm of official statistics.

GPT reads plots. Kind of.

For whatever whimsical reason, as I read the financial paper, I got the idea to take a picture of one of the plots and ask ChatGPT to extract the data. My naive expectation was that the image processing function would just wing it and give me a few “eyeballed” observations from the plot. Not so. Instead of eyeballing it, it created close to 100 lines of python code that read the image, did contour analysis, and combined it with some observations from the image such as axis (min/max on both axis).

Clip Similarity Search

I came across a cool post by Drew Breunig about finding bathroom faucets with the CLIP model: https://www.dbreunig.com/2023/09/26/faucet-finder.html. Multi-modal embedding models let you embed both text and images in the same embedding space, enabling search across both images and text. Although multimodal embedding models are seemed a mostly a blank slate, there is at least one multimodal embedding model available: openAI’s CLIP. Simon Willison, who makes the llm cli tool, has also made a plugin for the CLIP model, so taking the CLIP model out for a spin is really simple.

24 hours of Surface Pro

24 hours with a Surface Pro 🔗Some months ago I severely cracked the screen of my iPad 11" (2018). It is still useable, but I have wanted a new one but at the same time I didn’t want to just get another iPad. So yesterday I got a Microsoft Surface Pro 8, in the hope that it could cover my iPad use and 90% of my laptop use and reduce the number of times I have to drag my laptop around.

Snowflake Load Performance

We are using the Singer target transferwise-target-snowflake for loading data into our warehouse. This works well, but is slow out of the box. So, I wanted to check out some config options. Out of the box, there are some variables I want to adjust, and a few others that I’ll define as outside the scope of this test, and will remain static. The static factors 🔗For this test, there are a few things that we won’t change:

Econtwitter vs BLS

I have stopped engaging with twitter, but I do still read it. ML-twitter, econtwitter and some of the most popular data-personalities still rummage around there. Especially econtwitter is incredibly informative, and at times funny as hell. Like this weekend, when the VC-bros from the “all in” podcast and friends decided they were way smarter than the bureau of labor statistics. I’m not interested in dunking even more on the stupid VC-bro takes, but in a way this was actually reasonable questions wrapped in a reprehensive know-it-all attitude with a complete and utter lack of even basic introspection of the kind “if this doesn’t make sense to me, might it be that there is something I don’t understand?

Init

As usual with first blog posts, nothing to see here. 🔗

About

About 🔗Random thoughts on a random blog. Sooo 2003. Some more modern options include my Github: https://github.com/radbrt