Workflow System

Introduction:

The Institute for Genomic Research (TIGR) has many process piplelines that need to be created, executed, and monitored on an on-going basis. Each pipeline may include multiple discrete process that can be executed either sequentially or in parallel. To reduce manual intervention, and streamline the process flow, TIGR's Annotation software team has designed a system called Workflow that can be used to build, run, and monitor such process pipelines or workflows.

Specifications:

The goal of the Workflow system is to fully automate the process of executing a pipeline. The fully functional version of the Workflow system is expected to provide the ability to do the following:

  1. Managing workflows using a GUI: The Workflow system should provide a GUI utility to manage the workflows. This tool should have the ability to
    1. Build workflow templates
    2. Search / find templates.
    3. Build workflow instances from templates.
    4. View the status of a workflow.
    5. Run / pause / resume/ stop a workflow.
  2. Building workflow instances from configuration files: The Workflow system should be able to build instances of workflows based on a configuration file and a template workflow. The configuration file may contain the necessary information to execute a process.
  3. Handling variety of jobs
    1. A pipeline may contain a combination of parallel and sequential processes; hence the Workflow system should be able to handle both parallel and sequential jobs.
    2. The Workflow system should be able to process jobs in a Distributed Environment.
    3. The Workflow system should be able to handle processing jobs that may involve executing system commands.
  4. Resetting all the jobs and their status
    1. The Workflow system should be able to reset a particular job in a workflow and reprocess it again.
    2. The workflow system should be able to reset an entire workflow or a subset of workflow and start all over again.
  5. Branching and conditional flows
  6. Synchronizing multiple workflow instances: The Workflow system should be able to synchronize multiple them.
  7. Provide status of a workflow instance: The Workflow system should be able to provide the status of a workflow through a
    1. Command line interface.
    2. GUI.
  8. Persisting a workflow templates and instances: The Workflow system should be able to persist workflow templates and instances to
    1. A file.
    2. A Database.

Iteration Plan:

This section here describes the Work Flow System development plan, possible release schedules and project milestones.

Iteration Expected Release Released
Version
Comment
Iteration 1 January 2003  
1.0
Initial beta release to support serial and parallel processors
Iteration 2 March 2003  
2.0
Support GUI for browsing and creating workflow
Iteration 3 May 2006 September 2006
2.2
Compression of file, template editor
Iteration 4 August 2006 January 2007
3.0
Remote dispatcher, workflow checker, control workflow, local grid submissions

Iteration 1: This iteration is basically intended to come up with a working prototype with basic functionality. The features to be implemented in this iteration are 2, 3.1, 3.2, 4, 7.2, and 8.1 from the above specifications.

Iteration 2: GUI for browsing workflows. Features 1.2, 1.4, 1.5.

Iteration 3: GUI for creating and editing templates 1.1

Iteration 4: Reset and restart commands (4.1, 4.2), provide status on command line (7.1)

Implementation:

The execution of a pipeline is driven by a CommandSet. Each CommandSet can contain multiple Commands or CommandSets. For a detailed CommandSet definition, refer to the CommandSet Schema or dtd.

Team:

Anup Mahurkar, Dan Sommer and Sreenath Nampally.

Last Updated: January 16th 2007