dagster.
execute_pipeline
(pipeline: Union[dagster.core.definitions.pipeline.PipelineDefinition, dagster.core.definitions.pipeline_base.IPipeline], run_config: Optional[dict] = None, mode: Optional[str] = None, preset: Optional[str] = None, tags: Optional[Dict[str, Any]] = None, solid_selection: Optional[List[str]] = None, instance: Optional[dagster.core.instance.DagsterInstance] = None, raise_on_error: bool = True) → dagster.core.execution.results.PipelineExecutionResult[source]¶Execute a pipeline synchronously.
Users will typically call this API when testing pipeline execution, or running standalone scripts.
pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
run_config (Optional[dict]) – The environment configuration that parametrizes this run, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode
and preset
.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both
mode
and preset
.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None
,
an ephemeral instance will be used, and no artifacts will be persisted from the run.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
Defaults to True
, since this is the most useful behavior in test.
solid_selection (Optional[List[str]]) –
A list of solid selection queries (including single solid names) to execute. For example: - [‘some_solid’]: select “some_solid” itself. - [‘*some_solid’]: select “some_solid” and all its ancestors (upstream dependencies). - [‘*some_solid+++’]: select “some_solid”, all its ancestors, and its descendants
(downstream dependencies) within 3 levels down.
ancestors, “other_solid_a” itself, and “other_solid_b” and its direct child solids.
The result of pipeline execution.
For the asynchronous version, see execute_pipeline_iterator()
.
dagster.
execute_pipeline_iterator
(pipeline: Union[dagster.core.definitions.pipeline.PipelineDefinition, dagster.core.definitions.pipeline_base.IPipeline], run_config: Optional[dict] = None, mode: Optional[str] = None, preset: Optional[str] = None, tags: Optional[Dict[str, Any]] = None, solid_selection: Optional[List[str]] = None, instance: Optional[dagster.core.instance.DagsterInstance] = None) → Iterator[dagster.core.events.DagsterEvent][source]¶Execute a pipeline iteratively.
Rather than package up the result of running a pipeline into a single object, like
execute_pipeline()
, this function yields the stream of events resulting from pipeline
execution.
This is intended to allow the caller to handle these events on a streaming basis in whatever way is appropriate.
pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
run_config (Optional[dict]) – The environment configuration that parametrizes this run, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode
and preset
.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both
mode
and preset
.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
solid_selection (Optional[List[str]]) –
A list of solid selection queries (including single solid names) to execute. For example: - [‘some_solid’]: select “some_solid” itself. - [‘*some_solid’]: select “some_solid” and all its ancestors (upstream dependencies). - [‘*some_solid+++’]: select “some_solid”, all its ancestors, and its descendants
(downstream dependencies) within 3 levels down.
ancestors, “other_solid_a” itself, and “other_solid_b” and its direct child solids.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None
,
an ephemeral instance will be used, and no artifacts will be persisted from the run.
The stream of events resulting from pipeline execution.
Iterator[DagsterEvent]
dagster.
reexecute_pipeline
(pipeline: Union[dagster.core.definitions.pipeline_base.IPipeline, dagster.core.definitions.pipeline.PipelineDefinition], parent_run_id: str, run_config: Optional[dict] = None, step_selection: Optional[List[str]] = None, mode: Optional[str] = None, preset: Optional[str] = None, tags: Optional[Dict[str, Any]] = None, instance: dagster.core.instance.DagsterInstance = None, raise_on_error: bool = True) → dagster.core.execution.results.PipelineExecutionResult[source]¶Reexecute an existing pipeline run.
Users will typically call this API when testing pipeline reexecution, or running standalone scripts.
pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
parent_run_id (str) – The id of the previous run to reexecute. The run must exist in the instance.
run_config (Optional[dict]) – The environment configuration that parametrizes this run, as a dict.
step_selection (Optional[List[str]]) –
A list of step selection queries (including single step keys) to execute. For example: - [‘some_solid’]: select the execution step “some_solid” itself. - [‘*some_solid’]: select the step “some_solid” and all its ancestors
(upstream dependencies).
and its descendants (downstream dependencies) within 3 levels down.
”some_solid” and all its ancestors, “other_solid_a” itself, and “other_solid_b” and its direct child execution steps.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode
and preset
.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both
mode
and preset
.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None
,
an ephemeral instance will be used, and no artifacts will be persisted from the run.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
Defaults to True
, since this is the most useful behavior in test.
The result of pipeline execution.
For the asynchronous version, see reexecute_pipeline_iterator()
.
dagster.
reexecute_pipeline_iterator
(pipeline: Union[dagster.core.definitions.pipeline_base.IPipeline, dagster.core.definitions.pipeline.PipelineDefinition], parent_run_id: str, run_config: Optional[dict] = None, step_selection: Optional[List[str]] = None, mode: Optional[str] = None, preset: Optional[str] = None, tags: Optional[Dict[str, Any]] = None, instance: dagster.core.instance.DagsterInstance = None) → Iterator[dagster.core.events.DagsterEvent][source]¶Reexecute a pipeline iteratively.
Rather than package up the result of running a pipeline into a single object, like
reexecute_pipeline()
, this function yields the stream of events resulting from pipeline
reexecution.
This is intended to allow the caller to handle these events on a streaming basis in whatever way is appropriate.
pipeline (Union[IPipeline, PipelineDefinition]) – The pipeline to execute.
parent_run_id (str) – The id of the previous run to reexecute. The run must exist in the instance.
run_config (Optional[dict]) – The environment configuration that parametrizes this run, as a dict.
step_selection (Optional[List[str]]) –
A list of step selection queries (including single step keys) to execute. For example: - [‘some_solid’]: select the execution step “some_solid” itself. - [‘*some_solid’]: select the step “some_solid” and all its ancestors
(upstream dependencies).
and its descendants (downstream dependencies) within 3 levels down.
”some_solid” and all its ancestors, “other_solid_a” itself, and “other_solid_b” and its direct child execution steps.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode
and preset
.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both
mode
and preset
.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None
,
an ephemeral instance will be used, and no artifacts will be persisted from the run.
The stream of events resulting from pipeline reexecution.
Iterator[DagsterEvent]
dagster.
execute_solid
(solid_def, mode_def=None, input_values=None, tags=None, run_config=None, raise_on_error=True)[source]¶Execute a single solid in an ephemeral pipeline.
Intended to support unit tests. Input values may be passed directly, and no pipeline need be specified – an ephemeral pipeline will be constructed.
solid_def (SolidDefinition) – The solid to execute.
mode_def (Optional[ModeDefinition]) – The mode within which to execute the solid. Use this if, e.g., custom resources, loggers, or executors are desired.
input_values (Optional[Dict[str, Any]]) – A dict of input names to input values, used to
pass inputs to the solid directly. You may also use the run_config
to
configure any inputs that are configurable.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
run_config (Optional[dict]) – The environment configuration that parameterized this execution, as a dict.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
Defaults to True
, since this is the most useful behavior in test.
The result of executing the solid.
dagster.
execute_solid_within_pipeline
(pipeline_def, solid_name, inputs=None, run_config=None, mode=None, preset=None, tags=None, instance=None)[source]¶Execute a single solid within an existing pipeline.
Intended to support tests. Input values may be passed directly.
pipeline_def (PipelineDefinition) – The pipeline within which to execute the solid.
solid_name (str) – The name of the solid, or the aliased solid, to execute.
inputs (Optional[Dict[str, Any]]) – A dict of input names to input values, used to
pass input values to the solid directly. You may also use the run_config
to
configure any inputs that are configurable.
run_config (Optional[dict]) – The environment configuration that parameterized this execution, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode
and preset
.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both
mode
and preset
.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None
,
an ephemeral instance will be used, and no artifacts will be persisted from the run.
The result of executing the solid.
dagster.
execute_solids_within_pipeline
(pipeline_def, solid_names, inputs=None, run_config=None, mode=None, preset=None, tags=None, instance=None)[source]¶Execute a set of solids within an existing pipeline.
Intended to support tests. Input values may be passed directly.
pipeline_def (PipelineDefinition) – The pipeline within which to execute the solid.
solid_names (FrozenSet[str]) – A set of the solid names, or the aliased solids, to execute.
inputs (Optional[Dict[str, Dict[str, Any]]]) – A dict keyed on solid names, whose values are
dicts of input names to input values, used to pass input values to the solids directly.
You may also use the run_config
to configure any inputs that are configurable.
run_config (Optional[dict]) – The environment configuration that parameterized this execution, as a dict.
mode (Optional[str]) – The name of the pipeline mode to use. You may not set both mode
and preset
.
preset (Optional[str]) – The name of the pipeline preset to use. You may not set both
mode
and preset
.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to pipeline logs.
instance (Optional[DagsterInstance]) – The instance to execute against. If this is None
,
an ephemeral instance will be used, and no artifacts will be persisted from the run.
The results of executing the solids, keyed by solid name.
Dict[str, Union[CompositeSolidExecutionResult, SolidExecutionResult]]
dagster.core.execution.context.compute.
SolidExecutionContext
(system_compute_execution_context: dagster.core.execution.context.system.SystemComputeExecutionContext)[source]¶The context
object available to solid compute logic.
instance
¶The current Instance
pdb
¶Allows pdb debugging from within the solid.
Example:
@solid
def debug_solid(context):
context.pdb.set_trace()
pipeline_run
¶The current PipelineRun
solid_config
¶The parsed config specific to this solid.
dagster.core.execution.context.compute.
AbstractComputeExecutionContext
[source]¶Base class for solid context implemented by SolidExecutionContext and DagstermillExecutionContext
log
¶The log manager available in the execution context.
pipeline_def
¶The pipeline being executed.
pipeline_run
¶The PipelineRun object corresponding to the execution.
resources
¶Resources available in the execution context.
run_id
¶The run id for the context.
solid
¶The solid corresponding to the execution step being executed.
solid_config
¶The parsed config specific to this solid.
solid_def
¶The solid definition corresponding to the execution step being executed.
dagster.
reconstructable
[source]¶Create a ReconstructablePipeline from a function that returns a PipelineDefinition, or a
function decorated with @pipeline
When your pipeline must cross process boundaries, e.g., for execution on multiple nodes or in different systems (like dagstermill), Dagster must know how to reconstruct the pipeline on the other side of the process boundary.
This function implements a very conservative strategy for reconstructing pipelines, so that its behavior is easy to predict, but as a consequence it is not able to reconstruct certain kinds of pipelines, such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.
If you need to reconstruct pipelines constructed in these ways, you should use
build_reconstructable_pipeline()
instead, which allows you to specify your own
strategy for reconstructing a pipeline.
Examples:
from dagster import PipelineDefinition, pipeline, reconstructable
@pipeline
def foo_pipeline():
...
reconstructable_foo_pipeline = reconstructable(foo_pipeline)
def make_bar_pipeline():
return PipelineDefinition(...)
reconstructable_bar_pipeline = reconstructable(bar_pipeline)
dagster.
PipelineExecutionResult
(pipeline_def, run_id, event_list, reconstruct_context, resource_instances_to_override=None)[source]¶The result of executing a pipeline.
Returned by execute_pipeline()
. Users should not instantiate this class.
dagster.
SolidExecutionResult
(solid, step_events_by_kind, reconstruct_context, resource_instances_to_override=None)[source]¶Execution result for a leaf solid in a pipeline.
Users should not instantiate this class.
compute_input_event_dict
¶All events of type STEP_INPUT
, keyed by input name.
Dict[str, DagsterEvent]
compute_output_events_dict
¶All events of type STEP_OUTPUT
, keyed by output name
Dict[str, List[DagsterEvent]]
compute_step_events
¶All events generated by execution of the solid compute function.
List[DagsterEvent]
compute_step_failure_event
¶The STEP_FAILURE
event, throws if it did not fail.
expectation_events_during_compute
¶All events of type STEP_EXPECTATION_RESULT
.
List[DagsterEvent]
expectation_results_during_compute
¶All expectation results yielded by the solid
List[ExpectationResult]
failure_data
¶Any data corresponding to this step’s failure, if it failed.
Union[None, StepFailureData]
get_output_event_for_compute
(output_name='result')[source]¶The STEP_OUTPUT
event for the given output name.
Throws if not present.
output_name (Optional[str]) – The name of the output. (default: ‘result’)
The corresponding event.
get_output_events_for_compute
(output_name='result')[source]¶The STEP_OUTPUT
event for the given output name.
Throws if not present.
output_name (Optional[str]) – The name of the output. (default: ‘result’)
The corresponding events.
List[DagsterEvent]
input_events_during_compute
¶All events of type STEP_INPUT
.
List[DagsterEvent]
materialization_events_during_compute
¶All events of type STEP_MATERIALIZATION
.
List[DagsterEvent]
materializations_during_compute
¶All materializations yielded by the solid.
List[Materialization]
output_events_during_compute
¶All events of type STEP_OUTPUT
.
List[DagsterEvent]
output_value
(output_name='result')[source]¶Get a computed output value.
Note that calling this method will reconstruct the pipeline context (including, e.g., resources) to retrieve materialized output values.
output_values
¶The computed output values.
Returns None
if execution did not succeed.
the output values in the normal case
a dictionary from mapping key to corresponding value in the mapped case
Note that accessing this property will reconstruct the pipeline context (including, e.g., resources) to retrieve materialized output values.
dagster.
CompositeSolidExecutionResult
(solid, event_list, step_events_by_kind, reconstruct_context, handle=None, resource_instances_to_override=None)[source]¶Execution result for a composite solid in a pipeline.
Users should not instantiate this class.
dagster.
DagsterEvent
[source]¶Events yielded by solid and pipeline execution.
Users should not instantiate this class.
solid_handle
¶SolidHandle
event_specific_data
¶Type must correspond to event_type_value.
Any
step_key
DEPRECATED
Optional[str]
event_type
¶The type of this event.
dagster.
DagsterEventType
[source]¶The types of events that may be yielded by solid and pipeline execution.
ASSET_STORE_OPERATION
= 'ASSET_STORE_OPERATION'¶ENGINE_EVENT
= 'ENGINE_EVENT'¶HANDLED_OUTPUT
= 'HANDLED_OUTPUT'¶HOOK_COMPLETED
= 'HOOK_COMPLETED'¶HOOK_ERRORED
= 'HOOK_ERRORED'¶HOOK_SKIPPED
= 'HOOK_SKIPPED'¶LOADED_INPUT
= 'LOADED_INPUT'¶OBJECT_STORE_OPERATION
= 'OBJECT_STORE_OPERATION'¶PIPELINE_CANCELED
= 'PIPELINE_CANCELED'¶PIPELINE_CANCELING
= 'PIPELINE_CANCELING'¶PIPELINE_DEQUEUED
= 'PIPELINE_DEQUEUED'¶PIPELINE_ENQUEUED
= 'PIPELINE_ENQUEUED'¶PIPELINE_FAILURE
= 'PIPELINE_FAILURE'¶PIPELINE_INIT_FAILURE
= 'PIPELINE_INIT_FAILURE'¶PIPELINE_START
= 'PIPELINE_START'¶PIPELINE_STARTING
= 'PIPELINE_STARTING'¶PIPELINE_SUCCESS
= 'PIPELINE_SUCCESS'¶STEP_EXPECTATION_RESULT
= 'STEP_EXPECTATION_RESULT'¶STEP_FAILURE
= 'STEP_FAILURE'¶STEP_INPUT
= 'STEP_INPUT'¶STEP_MATERIALIZATION
= 'STEP_MATERIALIZATION'¶STEP_OUTPUT
= 'STEP_OUTPUT'¶STEP_RESTARTED
= 'STEP_RESTARTED'¶STEP_SKIPPED
= 'STEP_SKIPPED'¶STEP_START
= 'STEP_START'¶STEP_SUCCESS
= 'STEP_SUCCESS'¶STEP_UP_FOR_RETRY
= 'STEP_UP_FOR_RETRY'¶The
run_config
used byexecute_pipeline()
andexecute_pipeline_iterator()
has the following schema:{ # configuration for execution, required if executors require config execution: { # the name of one, and only one available executor, typically 'in_process' or 'multiprocess' __executor_name__: { # executor-specific config, if required or permitted config: { ... } } }, # configuration for loggers, required if loggers require config loggers: { # the name of an available logger __logger_name__: { # logger-specific config, if required or permitted config: { ... } }, ... }, # configuration for resources, required if resources require config resources: { # the name of a resource __resource_name__: { # resource-specific config, if required or permitted config: { ... } }, ... }, # configuration for solids, required if solids require config solids: { # these keys align with the names of the solids, or their alias in this pipeline __solid_name__: { # pass any data that was defined via config_field config: ..., # configurably specify input values, keyed by input name inputs: { __input_name__: { # if an dagster_type_loader is specified, that schema must be satisfied here; # scalar, built-in types will generally allow their values to be specified directly: value: ... } }, # configurably materialize output values outputs: { __output_name__: { # if an dagster_type_materializer is specified, that schema must be satisfied # here; pickleable types will generally allow output as follows: pickle: { path: String } } } } }, # optionally use an available system storage for intermediates etc. intermediate_storage: { # the name of one, and only one available system storage, typically 'filesystem' or # 'in_memory' __storage_name__: { config: { ... } } } }
dagster.
io_manager_from_intermediate_storage
(intermediate_storage_def)[source]¶Define an IOManagerDefinition
from an existing IntermediateStorageDefinition
.
This method is used to adapt an existing user-defined intermediate storage to a IO manager resource, for example:
my_io_manager_def = io_manager_from_intermediate_storage(my_intermediate_storage_def)
@pipeline(mode_defs=[ModeDefinition(resource_defs={"io_manager": my_io_manager_def})])
def my_pipeline():
...
intermediate_storage_def (IntermediateStorageDefinition) – The intermediate storage definition to be converted to an IO manager definition.
IOManagerDefinition
dagster.
mem_intermediate_storage
IntermediateStorageDefinition[source]¶The default in-memory intermediate storage.
In-memory intermediate storage is the default on any pipeline run that does not configure any custom intermediate storage.
Keep in mind when using this storage that intermediates will not be persisted after the pipeline
run ends. Use a persistent intermediate storage like fs_intermediate_storage()
to
persist intermediates and take advantage of advanced features like pipeline re-execution.
dagster.
fs_intermediate_storage
IntermediateStorageDefinition[source]¶The default filesystem intermediate storage.
Filesystem system storage is available by default on any ModeDefinition
that does
not provide custom system storages. To select it, include a fragment such as the following in
config:
intermediate_storage:
filesystem:
base_dir: '/path/to/dir/'
You may omit the base_dir
config value, in which case the filesystem storage will use
the DagsterInstance
-provided default.
dagster.
default_intermediate_storage_defs
List[IntermediateStorageDefinition]¶Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.
The default intermediate storages available on any ModeDefinition
that does not provide
custom intermediate storages. These are currently [mem_intermediate_storage
,
fs_intermediate_storage
].
dagster.
in_process_executor
ExecutorDefinition[source]¶The default in-process executor.
In most Dagster environments, this will be the default executor. It is available by default on
any ModeDefinition
that does not provide custom executors. To select it explicitly,
include the following top-level fragment in config:
execution:
in_process:
Execution priority can be configured using the dagster/priority
tag via solid metadata,
where the higher the number the higher the priority. 0 is the default and both positive
and negative numbers can be used.
dagster.
multiprocess_executor
ExecutorDefinition[source]¶The default multiprocess executor.
This simple multiprocess executor is available by default on any ModeDefinition
that does not provide custom executors. To select the multiprocess executor, include a fragment
such as the following in your config:
execution:
multiprocess:
config:
max_concurrent: 4
The max_concurrent
arg is optional and tells the execution engine how many processes may run
concurrently. By default, or if you set max_concurrent
to be 0, this is the return value of
multiprocessing.cpu_count()
.
Execution priority can be configured using the dagster/priority
tag via solid metadata,
where the higher the number the higher the priority. 0 is the default and both positive
and negative numbers can be used.
dagster.
default_executors
List[ExecutorDefinition]¶Built-in mutable sequence.
If no argument is given, the constructor creates a new empty list.
The argument must be an iterable if specified.
The default executors available on any ModeDefinition
that does not provide custom
executors. These are currently [in_process_executor
,
multiprocess_executor
].
dagster.
SystemComputeExecutionContext
(execution_context_data: dagster.core.execution.context.system.SystemExecutionContextData, log_manager: dagster.core.log_manager.DagsterLogManager, step: dagster.core.execution.plan.step.ExecutionStep, output_capture: Optional[Dict[dagster.core.execution.plan.outputs.StepOutputHandle, Any]] = None)[source]¶dagster.
TypeCheckContext
(execution_context_data: dagster.core.execution.context.system.SystemExecutionContextData, log_manager: dagster.core.log_manager.DagsterLogManager, dagster_type: dagster.core.types.dagster_type.DagsterType)[source]¶The context
object available to a type check function on a DagsterType.
log
¶Centralized log dispatch from user code.
resources
¶An object whose attributes contain the resources available to this solid.
Any
dagster.
HookContext
(execution_context_data: dagster.core.execution.context.system.SystemExecutionContextData, log_manager: dagster.core.log_manager.DagsterLogManager, hook_def: dagster.core.definitions.hook.HookDefinition, step: dagster.core.execution.plan.step.ExecutionStep)[source]¶The context
object available to a hook function on an DagsterEvent.
log
¶Centralized log dispatch from user code.
hook_def
¶The hook that the context object belongs to.
HookDefinition
step
¶The compute step associated with the hook.
ExecutionStep
solid
¶The solid instance associated with the hook.
Solid
resources
¶Resources available in the hook context.
NamedTuple
solid_config
¶The parsed config specific to this solid.
Any