# lightflow **Repository Path**: material-virtual-design/lightflow ## Basic Information - **Project Name**: lightflow - **Description**: Lightflow is a powerful, lightweight, and distributed workflow orchestration system written in Python. - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-11-27 - **Last Updated**: 2025-11-27 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README Lightflow 1.0 - Production Ready Workflow Engine ================================================ .. image:: https://badge.fury.io/py/Lightflow.svg :target: https://badge.fury.io/py/Lightflow .. image:: https://img.shields.io/badge/python-3.12-blue.svg :target: https://www.python.org/downloads/ .. image:: https://img.shields.io/badge/license-BSD-blue.svg :target: https://gitee.com/haidi-hfut/lightflow/blob/master/LICENSE Overview -------- Lightflow is a powerful, lightweight, and distributed workflow orchestration system written in Python 3.6+. It allows you to: - **Define workflows** using Directed Acyclic Graphs (DAGs) with clear task dependencies - **Execute tasks** in parallel or sequence with data flowing between them - **Distribute work** across multiple machines using Celery and Redis - **Persist state** using MongoDB, SQLite (MontyStore), or other Maggma backends - **Monitor execution** with comprehensive logging and debugging tools - **Scale horizontally** by adding more workers Use cases: - Data processing pipelines - ETL workflows - Machine learning pipelines - Scientific computing workflows - Batch job orchestration - Task scheduling and automation Key Features ------------ ✅ **Modular Architecture** - Clean separation of concerns - Easy to extend with custom task types - Pluggable storage backends - Interface-based design ✅ **Production Ready** - Comprehensive error handling - Professional logging with multiple levels - Full test coverage (91+ tests) - Stable API ✅ **Distributed Execution** - Celery-based task distribution - Redis broker for inter-process communication - Multi-worker support - Fault tolerance and recovery ✅ **Data Flow** - Automatic data passing between tasks - Support for complex data structures - Persistent data storage - Task context and metadata ✅ **Easy to Use** - Simple Python API - Intuitive CLI - Clear documentation - 15+ working examples Requirements ------------ Python ^^^^^^ Lightflow requires **Python 3.8 or higher**. Tested on Python 3.6 - 3.12. Operating System ^^^^^^^^^^^^^^^^ Developed and tested on Linux (Debian/Ubuntu and RedHat). Should work on macOS and Windows with minor adjustments. External Services ^^^^^^^^^^^^^^^^^ **Redis** (Required) Used as message broker for Celery task queue and signal communication. Install via: ``apt-get install redis-server`` or ``brew install redis`` **MongoDB** (Optional) For production data storage. By default, Lightflow uses MontyStore (SQLite), which requires no external database. Installation ------------ From PyPI (Recommended) ^^^^^^^^^^^^^^^^^^^^^^^ :: pip install lightflow From Source ^^^^^^^^^^^ :: git clone https://gitee.com/haidi-hfut/lightflow.git cd Lightflow pip install -e . Development Installation ^^^^^^^^^^^^^^^^^^^^^^^^^ :: pip install -e ".[dev]" pytest tests/ Quick Start ----------- 1. **Start Redis** (required) :: redis-server --daemonize yes 2. **Start a Celery Worker** :: lightflow worker start 3. **Run a Workflow** (in another terminal) :: lightflow workflow start data_pipeline 4. **Watch the Output** The worker terminal shows each task executing with timing information. Example Workflow ---------------- Here's a simple two-task workflow: :: from lightflow.core import Dag, Action from lightflow.tasks import PythonTask def task_one(data, store, signal, context): """First task - store data.""" data['value'] = 42 return Action(data) def task_two(data, store, signal, context): """Second task - use data from first task.""" print(f"Value from previous task: {data['value']}") return Action(data) # Create DAG dag = Dag('example_dag') # Create tasks t1 = PythonTask(name='task_one', callback=task_one) t2 = PythonTask(name='task_two', callback=task_two) # Define execution order dag.define({t1: t2}) Save as ``example.py`` and run: :: lightflow workflow start example Command-Line Interface ---------------------- **Workflow Management** :: lightflow workflow start # Start a workflow lightflow workflow stop # Stop running workflow lightflow workflow list # List available workflows **Worker Management** :: lightflow worker start # Start a worker lightflow worker stop # Stop all workers lightflow worker list # List active workers **Configuration** :: lightflow config check-config # Verify configuration lightflow version # Show version Configuration -------------- Lightflow looks for configuration in the following locations (in order): 1. ``~/.lightflow.yaml`` (user home directory) 2. ``./lightflow.yaml`` (current directory) 3. Built-in defaults (MontyStore + Redis) Example Configuration ^^^^^^^^^^^^^^^^^^^^^ :: # ~/.lightflow.yaml logging: version: 1 disable_existing_loggers: true formatters: simple: (): 'colorlog.ColoredFormatter' format: '[%(asctime)s][%(levelname)s] %(message)s' handlers: console: class: logging.StreamHandler level: INFO formatter: simple root: handlers: [console] level: INFO data_store: store_type: monty # or 'mongo' for MongoDB database_path: /tmp/lightflow_db signal: host: localhost port: 6379 db: 0 celery: broker_url: redis://localhost:6379/0 result_backend: redis://localhost:6379/0 Architecture ------------ The framework is organized into several key modules: **Core** (``lightflow.core``) - Workflow and DAG orchestration - Task execution engine - Data flow management - Exception hierarchy **Tasks** (``lightflow.tasks``) - Base task class - Python task implementation - Bash task implementation - Task context and signals **Infrastructure** (``lightflow.infrastructure``) - Celery queue integration - Maggma-based data storage (MongoDB, SQLite, etc.) - Redis signal system - Worker management **Configuration** (``lightflow.config``) - YAML-based configuration management - Auto-discovery of config files - Configuration validation **CLI** (``lightflow.scripts``) - Command-line interface - Workflow management - Worker lifecycle - Configuration utilities Testing ------- Run the test suite: :: pytest tests/ -v Run only unit tests: :: pytest tests/unit/ -v Run integration tests (requires running worker): :: lightflow worker start # In one terminal pytest tests/integration/ -v # In another terminal Test coverage is 91+ tests with 100% coverage of core functionality. Data Persistence ---------------- Lightflow supports multiple data store backends via Maggma: - **MontyStore** (Default) - SQLite-based, no external dependencies - **MongoDB** - Full-featured, production-grade - **Memory** - For testing - **JSON** - File-based storage - **GridFS** - MongoDB GridFS support Switch backends by updating your config or environment. Examples -------- Lightflow comes with 15+ example workflows demonstrating: - Simple sequential tasks - Branching and conditional execution - Parallel task execution - Data passing between tasks - Error handling and recovery - Task parameters - And more... Located in ``examples/`` directory. Troubleshooting --------------- **Tasks not executing?** 1. Ensure Redis is running: ``redis-cli ping`` should return PONG 2. Start a worker: ``lightflow worker start`` 3. Check logs for errors: See Configuration section **Data not persisting?** 1. Check data store configuration 2. Ensure storage backend is accessible 3. Review logs in detailed mode (set loglevel to DEBUG) **Print statements not visible?** Use logging instead of print in task callbacks: :: from lightflow.utils import get_logger logger = get_logger(__name__) def my_task(data, store, signal, context): logger.info("This will be visible in worker logs") Documentation ------------- - **QUICK_START_EXAMPLE.md** - Step-by-step getting started guide (in develop/ folder) - **DEBUG_TASK_EXECUTION.md** - Comprehensive debugging guide (in develop/ folder) - **RELEASE_NOTES_1.0.0.md** - Complete feature list (in develop/ folder) - API documentation in docstrings Performance ----------- Typical performance metrics: - Task dispatch: <100ms - Task overhead: ~50ms - Data serialization: varies by size - Scales to 1000+ tasks per workflow - Horizontal scaling via additional workers License ------- Lightflow is licensed under the BSD-3 License. See LICENSE file for details. Contributing ------------ Contributions are welcome! Please: 1. Fork the repository 2. Create a feature branch 3. Add tests for new functionality 4. Submit a pull request See CONTRIBUTING.md for detailed guidelines. Support ------- For issues, questions, or suggestions: - Open an issue on GitHub - Check existing documentation - Review example workflows - Check troubleshooting section above Citation -------- If you use Lightflow in your research, please cite: :: @software{lightflow2025, title={Lightflow: A Lightweight Distributed Workflow Engine}, author={Material Virtual Design Group of HFUT}, year={2025}, url={https://gitee.com/haidi-hfut/lightflow.git} } Changelog --------- **1.0.0** (November 2025) - Complete rewrite with modular architecture - Maggma-based data persistence (MongoDB, SQLite, etc.) - Professional CLI with subcommands - Comprehensive logging and debugging - 91+ unit and integration tests - Production-ready status Authors ------- **Current Development**: haidi@hfut.edu.cn **Version**: 1.0.0 **Last Updated**: November 2025