# llm-to-sql **Repository Path**: chen_fang_yi/llm-to-sql ## Basic Information - **Project Name**: llm-to-sql - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-07-18 - **Last Updated**: 2025-07-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # LLM-to-SQL This project demonstrates how to use a Large Language Model (LLM) to convert natural language queries into SQL queries. The application uses the `pydantic-ai` library to create an agent that can interact with a SQLite database to retrieve schema information and generate SQL queries for an example employee database. ## Features - Converts complex natural language questions into SQL queries. - Uses `pydantic-ai` for agent-based LLM interaction. - Dynamically loads table schemas from a Python file (`schemas.py`). - Retrieves table names and schema information from a SQLite database at runtime. - Generates MSSQL-compatible queries with `WITH(NOLOCK)`. - Provides a clear and modular project structure for easy extension. ## How it works The application works in the following steps: 1. **Schema Definition**: The `schemas.py` file defines the table structures for an example employee database using `SQLTableRow` objects. This allows for easy addition or modification of table schemas. 2. **Data Ingestion**: The `data_ingestion.py` script reads all `SQLTableRow` instances from `schemas.py`, creates a SQLite database (`sql_data.db`), and populates it with the table schema information. 3. **Natural Language Input**: The user provides a natural language query in the `main.py` script. 4. **Agent Interaction**: The `pydantic-ai` agent, defined in `main.py`, receives the natural language query. 5. **Tool Usage**: The agent uses the provided tools (`get_tables` and `get_table_schema`) to query the SQLite database and retrieve the necessary schema information to understand the database structure. 6. **SQL Generation**: The agent sends the natural language query and the retrieved schema information to the LLM. 7. **SQL Output**: The LLM generates an SQL query based on the provided information, which is then printed to the console. ## Getting Started ### Prerequisites - Python 3.8+ - Gemini API key. ### Installation 1. Clone the repository: ```bash git clone https://github.com/honey-trivedi/llm-to-sql.git cd llm-to-sql ``` 2. Install the dependencies: ```bash uv sync ``` ### Usage 1. **Ingest the data**: Run the `data_ingestion.py` script to create the database and populate it with the employee schema information. ```bash uv run data_ingestion.py ``` 2. **Run the main application**: Execute the `main.py` script to start the LLM-to-SQL conversion process. The script contains an example prompt to query the employee database. ```bash uv run main.py ``` The script will then print the generated SQL query to the console. ## Project Structure ``` / ├── .env # Environment variables for configuration (Add GEMINI_API_KEY here) ├── config.py # Configuration for the database file and table name ├── data_ingestion.py # Script to create and populate the database with schemas ├── data_models.py # Pydantic model for the SQL table row ├── data_queries.py # Functions to query the database for schema information ├── main.py # Main application logic for the LLM agent ├── pyproject.toml ├── README.md ├── schemas.py # Defines the database table schemas └── sql_data.db # SQLite database file (Will be created by data_ingestion.py) ``` ## Configuration - **Database File**: The SQLite database file name can be configured in `config.py`. The default is `sql_data.db`. - **LLM Model**: The LLM model can be configured by setting the `PYDANTIC_AI_MODEL` environment variable. The default is `gemini-1.5-flash`. ## Dependencies - `pydantic` - `pydantic-ai` - `sqlite3` (standard library) - `uv` (for package management)