This package provides a framework for building ELT pipelines with Dagster through helpful pre-built assets and resources. It is currently in experimental development, and we'd love to hear your feedback.
This package currently includes a single implementation using Sling, which provides a simple way to sync data between databases and file systems.
We plan on adding additional embedded ELT tool integrations in the future.
To get started with dagster-embedded-elt and Sling, familiarize yourself with Sling by reading their docs which describe how sources and targets are configured.
The typical pattern for building an ELT pipeline with Sling has three steps:
First, create a SlingResource which is a container for the source and the target.
A Sling resource is a Dagster resource that contains references to both a source connection and a target connection. Sling is versatile in what a source or destination can represent. You can provide arbitrary keywords to the SlingSourceConnection and SlingTargetConnection classes.
The types and parameters for each connection are defined by Sling's connections.
The simplest connection is a file connection, which can be defined as:
from dagster_embedded_elt.sling import SlingSourceConnection
source = SlingSourceConnection(type="file")
sling = SlingResource(source_connection=source,...)
Note that no path is required in the source connection, as that is provided by the asset itself.
For database connections, you can provide a connection string or a dictionary of keyword arguments. For example, to connect to a SQLite database, you can provide a path to the database using the instance keyword, which is specified in Sling's SQLite connection documentation.