These docs aim to cover the entire public surface of the core dagster
APIs, as well as public APIs
from all provided libraries.
Dagster follows SemVer. We attempt to isolate breaking changes to the public APIs to minor versions (on a roughly 12-week cadence) and will announce deprecations in Slack and in the release notes to patch versions (on a roughly weekly cadence).
APIs from the core dagster
package, divided roughly by topic:
Solids APIs to define or decorate functions as solids, declare their inputs and outputs, compose solids with each other, as well as the datatypes that solid execution can return or yield.
Pipelines APIs to define pipelines, dependencies and fan-in dependencies between solids, and aliased instances of solids.
Modes & Resources APIs to define pipeline modes and resources.
Presets APIs to define configuration presets.
Loggers APIs to define where logs go.
Repositories APIs to define collections of pipelines and other definitions that tools such as the Dagster CLI or Dagit can load from.
Config The types available to describe config schemas.
Types Primitive types available for the input and output values of solids, and the APIs used to define and test new Dagster types.
Dagster CLI Browse repositories and execute pipelines from the command line
Schedules & Sensors APIs to define schedules and sensors that initiate pipeline execution, as well as some built-in helpers for common cases.
Partitions APIs to define partitions of the config space over which pipeline runs can be backfilled.
Errors Errors thrown by the Dagster framework.
Execution APIs to execute and test pipelines and individual solids, the execution context available to solids, pipeline configuration, and the default executors available for executing pipelines.
Hooks APIs to define Dagster hooks, which can be triggered on specific Dagster events.
IO Managers APIs to define how inputs and outputs are handled and loaded.
Dynamic Graphs (Experimental) APIs that allow graph structures to be determined at run time.
Utilities Miscellaneous helpers used by Dagster that may be useful to users.
Internals Core internal APIs that are important if you are interested in understanding how Dagster works with an eye towards extending it: logging, executors, system storage, the Dagster instance & plugin machinery, storage, schedulers.
Dagster also provides a growing set of optional add-on libraries to integrate with infrastructure and other components of the data ecosystem:
Airflow (dagster_airflow
) Tools for compiling Dagster
pipelines to Airflow DAGs, and for ingesting Airflow DAGs to represent them in Dagster.
AWS (dagster_aws
) Dagster integrations for working with
AWS resources.
Azure (dagster_azure
) Dagster integrations for working
with Microsoft Azure resources.
Celery (dagster_celery
) Provides an executor built on
top of the popular Celery task queue, and an executor with support
for using Celery on Kubernetes.
Celery+Docker (dagster_celery_docker
) Provides an
executor that lets Celery workers execute in Docker containers.
Cron (dagster_cron
) Provides a simple scheduler
implementation built on system cron.
Dask (dagster_dask
) Provides an executor built on top of
dask.distributed.
dbt (dagster_dbt
) Provides solids and resources to run
dbt projects.
Datadog (dagster_datadog
) Provides an integration with
Datadog, to support publishing metrics to Datadog from within Dagster solids.
Databricks (dagster_databricks
) Provides solids and
resources for integrating with Databricks.
GCP (dagster_gcp
) Dagster integrations for working with
Google Cloud Platform resources.
GE (dagster_ge
) Dagster integrations for working with
Great Expectations data quality tests.
GitHub (dagster_github
) Provides a resource for
issuing GitHub GraphQL queries and filing GitHub issues from Dagster pipelines.
Kubernetes (dagster_k8s
) Provides components for
deploying Dagster to Kubernetes, along with an experimental Helm chart.
PagerDuty (dagster_pagerduty
) Provides an
integration for generating PagerDuty events from Dagster solids.
Pandas (dagster_pandas
) Provides support for using
pandas DataFrames in Dagster and utilities for performing data validation.
Papertrail (dagster_papertrail
) Provides support
for sending Dagster logs to Papertrail.
Postgres (dagster_postgres
) Includes implementations
of run and event log storage built on Postgres.
Prometheus (dagster_prometheus
) Provides support
for sending metrics to Prometheus.
Pyspark (dagster_pyspark
) Provides an integration
with pyspark.
Shell (dagster_shell
) Provides utilities for issuing shell
commands from Dagster pipelines.
Slack (dagster_slack
) Provides a simple integration
with Slack.
Snowflake (dagster_snowflake
) Provides a resource
for querying Snowflake from Dagster.
Spark (dagster_spark
) Provides an integration
for working with Spark in Dagster.
SSH / SFTP (dagster_ssh
) Provides an integration
for running commands over SSH and retrieving / posting files via SFTP.
Twilio (dagster_twilio
) Provides a resource for posting SMS
messages from solids via Twilio.