Building an Airflow Pipeline That Talks to AWS — Data Pipelines in the Cloud (III)

Building an Airflow Pipeline That Talks to AWS — Data Pipelines in the Cloud (III)
This tutorial is a complete guide to building an end-to-end data pipeline with Apache Airflow that communicates with AWS services like RDS (relational database) and S3 (object storage) to perform data transformations automatically and efficiently.
Read more →

Using Amazon Web Services (AWS) with the Command Line — Data Pipelines in the Cloud (II)

Using Amazon Web Services (AWS) with the Command Line — Data Pipelines in the Cloud (II)
Welcome back to the ‘Data Pipelines in the Cloud’ series! In the first part, I introduced Airflow as a tool for orchestrating data pipelines and demonstrated how to code and execute a minimal Airflow pipeline (DAG) on your local environment. In this second part, we’ll lay the ground to build a more functional Airflow DAG by using the AWS Command Line Interface to set up a relational database in the cloud (PostgreSQL), along with a bucket for object storage (S3). We’ll then upload a sample CSV file to the bucket, which we’ll later use as input for an Airflow DAG that performs a meaningful transformation on this data.
Read more →

Sharing My Advent of Code 2023 with Quarto (And How You Can Do the Same)

Sharing My Advent of Code 2023 with Quarto (And How You Can Do the Same)
Check the website here.
As a Christmas enthusiast, I’ve always been intrigued by the Advent of Code, a series of daily programming puzzles leading up to Christmas. This year, I’m taking on the challenge with either R or Python, adding a touch of whimsy by using a spinning wheel to choose my language each day. I’m also sharing my solutions on a special Advent of Code-themed website. Find out how you can create your own Advent of Code site and automate the process with the aochelpers R package.
Read more →

A Beginner’s Introduction to Airflow with Docker — Data Pipelines in the Cloud (I)

A Beginner's Introduction to Airflow with Docker — Data Pipelines in the Cloud (I)
My attempt of using Stable Diffussion to depict something cloud-computery.
Learn the essentials of Apache Airflow for creating scalable and automated data pipelines in the cloud with this comprehensive, step-by-step beginner’s guide. Discover what problem Airflow solves and under what circumstances is better to use it and run your first Airflow DAG on Docker with the Linux subsystem for Windows.
Read more →

Matching in R (III): Propensity Scores, Weighting (IPTW) and the Double Robust Estimator

Matching in R (III): Propensity Scores, Weighting (IPTW) and the Double Robust Estimator
Woman Holding a Balance, c. 1664
In the last part of this series about Matching estimators in R, we’ll look at Propensity Scores as a way to solve covariate imbalance while handling the curse of dimensionality, and to how implement a Propensity Score estimator using the twang package in R. We’ll also explore the importance of common support, the inverse probability weighting estimator (IPTW) and the double robust estimator, which combines a regression specification with a matching-based model in order to obtain a good estimate even when there is something wrong with one of the two underlying models.
Read more →
Mastodon