A Basepair workflow is a collection of modules that run together. A workflow may have a single module (e.g., count the number of lines in a fastq file) or they may 20+ modules doing QC, alignment, figures, etc. The workflow is a defined as a directed acyclic graph (DAG), where each module is a node in the graph. Having a graph structure allows a module to have multiple parent modules, useful when a module needs the output from two (or more) modules. modules:
The workflow YAML files define 3 primary information:- A collection of nodes: each module is assigned to a node with an unique node id. If a module is required multiple times in a workflow, each instance is assigned a different node id. Further, default parameters can be set for each module.
- A collection of edges: parent-child relationship between all the modules. Starting with a root module, all other modules are connected so that the all the modules of workflow will run.
- Mappings: Common global meta-data is directly passed on to the modules, e.g., the genome assembly information may be passed to all the modules that need to know the assembly.