Radbrt
A Singer Tap for Official Statistics šIn the world of data engineering, Singer is popular standard, with tools like Airbyte and Meltano providing a flexible framework for data loading. One source that is often overlooked for data loading needs however, is official statistics. Different statistical offices around the world have different APIs (and in some cases no API at all), but one place to start is PxWeb.
PxWeb is a common thread connecting Norway, Sweden, and Finland in the realm of official statistics.
For whatever whimsical reason, as I read the financial paper, I got the idea to take a picture of one of the plots and ask ChatGPT to extract the data. My naive expectation was that the image processing function would just wing it and give me a few āeyeballedā observations from the plot.
Not so. Instead of eyeballing it, it created close to 100 lines of python code that read the image, did contour analysis, and combined it with some observations from the image such as axis (min/max on both axis).
I came across a cool post by Drew Breunig about finding bathroom faucets with the CLIP model: https://www.dbreunig.com/2023/09/26/faucet-finder.html.
Multi-modal embedding models let you embed both text and images in the same embedding space, enabling search across both images and text. Although multimodal embedding models are seemed a mostly a blank slate, there is at least one multimodal embedding model available: openAIās CLIP.
Simon Willison, who makes the llm cli tool, has also made a plugin for the CLIP model, so taking the CLIP model out for a spin is really simple.
24 hours with a Surface Pro šSome months ago I severely cracked the screen of my iPad 11" (2018). It is still useable, but I have wanted a new one but at the same time I didnāt want to just get another iPad. So yesterday I got a Microsoft Surface Pro 8, in the hope that it could cover my iPad use and 90% of my laptop use and reduce the number of times I have to drag my laptop around.
We are using the Singer target transferwise-target-snowflake for loading data into our warehouse. This works well, but is slow out of the box. So, I wanted to check out some config options. Out of the box, there are some variables I want to adjust, and a few others that Iāll define as outside the scope of this test, and will remain static.
The static factors šFor this test, there are a few things that we wonāt change:
I have stopped engaging with twitter, but I do still read it. ML-twitter, econtwitter and some of the most popular data-personalities still rummage around there.
Especially econtwitter is incredibly informative, and at times funny as hell. Like this weekend, when the VC-bros from the āall inā podcast and friends decided they were way smarter than the bureau of labor statistics. Iām not interested in dunking even more on the stupid VC-bro takes, but in a way this was actually reasonable questions wrapped in a reprehensive know-it-all attitude with a complete and utter lack of even basic introspection of the kind āif this doesnāt make sense to me, might it be that there is something I donāt understand?
As usual with first blog posts, nothing to see here. š
About šRandom thoughts on a random blog. Sooo 2003.
Some more modern options include my Github: https://github.com/radbrt