This section will show you how to a create a new Dagster project and organize your files as you build larger and larger pipelines. Dagster comes with a convenient CLI command for generating a project skeleton, but you can also choose to organize your files differently as your project evolves.
If you're completely new to Dagster, we recommend that you visit our Tutorial to learn all the basic concepts of Dagster.
In Dagster, pipelines are grouped together into repositories. A Dagster repository and a Git repository are two different concepts. Visit the Repositories overview for more details on Dagster repositories.
If you're just starting a new Dagster project, the CLI command dagster new-repo
will generate a
project skeleton with boilerplate code for development and testing. If you have dagster
installed in your Python environment, then you can run the following shell command to generate
a Dagster project called REPO_NAME
:
dagster new-repo REPO_NAME
cd REPO_NAME
The newly generated REPO_NAME
directory is in fact a fully functioning
Python package and can be installed with
pip
. This Python package is considered a repository location in Dagster. A single Python file
can also be a repository location, like in the Hello World guide.
For more details about different types of repository locations, visit the
Repositories overview.
Here's a breakdown of the files and directories that are generated:
File/Directory | Description |
---|---|
REPO_NAME/ | A Python module that contains code for your new Dagster repository |
REPO_NAME_tests/ | A Python module that contains tests for REPO_NAME |
workspace.yaml | A file that specifies the location of the user code for Dagit and the Dagster CLI. Visit the Workspaces overview for more details. |
README.md | A description and guide for your new code repository |
setup.py | A build script with Python package dependencies for your new code repository |
Inside of the directory REPO_NAME/
, the following files and directories are generated:
File/Directory | Description |
---|---|
REPO_NAME/solids/ | A Python module that contains SolidDefinitions, which represent individual units of computation |
REPO_NAME/pipelines/ | A Python module that contains PipelineDefinitions, which are built up from solids |
REPO_NAME/schedules/ | A Python module that contains ScheduleDefinitions, to trigger recurring pipeline runs based on time |
REPO_NAME/sensors/ | A Python module that contains SensorDefinitions, to trigger pipeline runs based on external state |
REPO_NAME/repository.py | A Python file that contains a RepositoryDefinition, to specify which pipelines, schedules, and sensors are available in your repository |
This file structure is a good starting point and suitable for most Dagster projects. As you build more and more pipelines, you may eventually find your own way of structuring your code that works best for you.
--editable
flag, pip
will install your repository in
"editable mode"
so that as you develop, local code changes will automatically apply.pip install --editable .
dagit
The Dagit process automatically uses the file workspace.yaml
to find where your repository
location(s), from which Dagster will load your pipelines, schedules, and sensors. To see how you can
customize the Dagit process, run dagit --help
.
dagster-daemon run
Once your Dagster Daemon process is running, you should be able to enable schedules and sensors for your Dagster pipelines in your repository.
Once you have created a new Dagster repository with the CLI command dagster new-repo
, you can find
tests in REPO_NAME_tests
, where REPO_NAME
is the name of your repository. You can run all of
your tests with the following command:
pytest REPO_NAME_tests
As you create Dagster solids and pipelines, add tests in REPO_NAME_tests/
to check that your
code behaves as desired and does not break over time.
For hints on how to write tests for solids and pipelines in Dagster, see our documentation tutorial on Testing.
Once your Dagster project is ready, visit the Deployment Guides to learn how to run Dagster in production environments, such as Docker, Kubernetes, AWS EC2, etc.