# dpdk_test **Repository Path**: wudi202/dpdk_test ## Basic Information - **Project Name**: dpdk_test - **Description**: 基于dpdk的一个简单的收包及处理程序,可以输出ck日志 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-10 - **Last Updated**: 2026-02-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DPDK Packet Processing System High-performance network traffic analysis system based on DPDK (Data Plane Development Kit). Designed for efficient protocol identification, flow tracking, and structured logging to ClickHouse. ## Features - **High Performance Architecture**: - Built on DPDK for line-rate packet processing. - Scalable multi-core design using RSS (Receive Side Scaling). - **Lockless Design**: Per-thread Flow Tables and Aging Lists to minimize contention. - **Optimized Memory**: Dedicated `rte_mempool` for log messages to avoid expensive allocations in the fast path. - **Protocol Detection Engine**: - **HTTP**: Identifies GET requests; extracts Host, URL, User-Agent, and custom headers. - **TLS**: Strict ClientHello validation; extracts SNI, JA3 (MD5), and JA4 (SHA256) fingerprints. - **STUN**: Detects STUN traffic via standard Magic Cookie. - **Port-based**: Fast lookup for well-known ports (thread-local tables). - **Modular**: Registration-based framework (`proto_register`) makes adding new protocols easy. - **Traffic & Flow Analysis**: - **Bi-directional Flow Tracking**: Tracks 5-tuple flows (Src/Dst IP, Port, Proto). - **Smart Aging**: 3-tier aging strategies (TCP active, UDP, TCP FIN/RST) to manage state efficiently. - **IP Watchlist**: High-performance CIDR matching using DPDK LPM (Longest Prefix Match). - **Logging & Analytics**: - **ClickHouse Backend**: High-throughput insertion for massive scale logging. - **Batch Processing**: Logs are batched by type (HTTP/TLS/Flow) for efficient database writes. - **Rich Metadata**: Logs include timestamps, IP/Ports, and protocol-specific fields (headers, fingerprints). ## System Architecture ### 1. Data Flow Pipeline ```mermaid graph TB subgraph NIC["Network Interface (DPDK Port)"] Q0[RX Queue 0] Q1[RX Queue 1] end subgraph Workers["Worker Threads (lcore pinned)"] W0["Worker 0"] W1["Worker 1"] end Q0 --> W0 Q1 --> W1 subgraph Processing["Packet Processing Pipeline"] PP[Packet Parse
Eth/IP/TCP/UDP] FK[Flow Key
5-Tuple + Direction] FT[Flow Lookup
rte_hash table] FA[Flow Aging
Active/UDP/FIN Chains] PD[Proto Detect
Dispatch Engine] end W0 --> PP --> FK --> FT --> FA --> PD W1 --> PP subgraph Detectors["Protocol Detectors"] HTTP[HTTP
Method/Header Analysis] TLS[TLS
ClientHello Parsing] STUN[STUN
Magic Cookie] PORT[Port
Well-known Lookup] end PD --> HTTP PD --> TLS PD --> STUN PD --> PORT subgraph Logging["Logging System"] Ring["DPDK Ring
(MP/SC mode)"] LT["Log Thread
(Dedicated lcore)"] CK[(ClickHouse)] end HTTP -->|Log Msg| Ring TLS -->|Log Msg| Ring FA -->|Log Msg| Ring Ring --> LT -->|Batch Insert| CK ``` ### 2. Core Components #### Threading Model - **Worker Threads**: - Bound to specific CPU cores (`lcore`). - Each processing a dedicated RX queue (Run-to-completion model). - Uses **Thread-Local Storage (TLS)** for Flow Tables and Aging Lists to avoid lock contention. - **Log Thread**: - Runs on a dedicated core. - Consumes messages from a shared `rte_ring`. - Batches writes to ClickHouse to maximize throughput. #### Memory Management - **Mempools**: Extensive use of `rte_mempool` to prevent memory fragmentation and overhead. - **Packet Pool**: Standard DPDK mbuf pool. - **Flow Pool**: Fixed-size pool for `flow_entry_t` structures per thread. - **Log Message Pool**: dedicated pool for passing analytics data from Worker to Log thread. #### Flow Table Design - **Key**: standard 5-tuple (Src/Dst IP, Src/Dst Port, Proto). - **Structure**: `rte_hash` (Cuckoo hash) for fast lookups. - **Aging**: - Three linked lists per thread: `TCP_Active`, `UDP`, `TCP_FIN`. - **LRU-like behavior**: Packets move flows to the tail; aging scanner checks from the head. - Efficient: Stops scanning once a non-expired flow is found (since list is time-ordered). #### Protocol Detection - **Mechanism**: Pluggable framework. - **Registration**: Detectors register via `proto_register()` at startup. - **Priority**: 1. **Strict Detectors** (e.g., HTTP method match, TLS ClientHello). 2. **Port-based** (Fallback if payload analysis fails). - **Extensibility**: - Add `src/proto/proto_new.c`. - Implement `detect_fn` and `process_fn`. - Register in `proto_detect_init()`. ## Directory Structure ``` dpdk_test/ ├── src/ │ ├── config/ # JSON configuration loading │ ├── flow/ # Flow table and aging management │ ├── iplist/ # IP watchlist (LPM) │ ├── log/ # Logging thread and ClickHouse writer │ ├── packet/ # Packet parsing and processing loop │ ├── proto/ # Protocol detection modules (HTTP, TLS, etc.) │ ├── utils/ # Hashing (MD5, SHA256) and helpers │ ├── dpdk_init.c # EAL and resource initialization │ └── main.c # Entry point and thread launching ├── conf/ # Configuration files ├── sql/ # ClickHouse DDL scripts └── Makefile # Build script ``` ## Requirements - **OS**: Linux (kernel supporting DPDK, e.g., Hugepages enabled). - **DPDK**: Version 21.11 LTS or compatible. - **libs**: - `libcjson` - `clickhouse-cpp` (and its dependencies: `lz4`, `zstd`, `cityhash`, `openssl`) ## Build Instructions 1. **Setup Environment**: Ensure `PKG_CONFIG_PATH` includes DPDK and clickhouse-cpp. 2. **Compile**: ```bash make # or for verbose output make V=1 ``` 3. **Database Setup**: Apply the schema to ClickHouse. ```bash clickhouse-client --multiline < sql/create_tables.sql ``` ## Configuration Edit `conf/config.json` to match your environment. Key settings: - **dpdk**: `mbuf_pool_size`, `rx_desc` size. - **ports**: Map physical DPDK ports to RX queues. - **workers**: Bind logical cores (`lcore_id`) to specific Port/Queue pairs. - **clickhouse**: Database host, port, credentials, and table names. - **modules**: Toggle protocol parsers (`http`, `tls`, `stun`, etc.). ## Usage Run the application using the generated binary. You must provide EAL arguments (for DPDK) and Application arguments. ```bash # Syntax sudo ./build/dpdk_pkt_proc [EAL Options] -- -c # Example # -l 1-4 : Use cores 1, 2, 3, 4 (e.g., 3 workers + 1 log thread) # -n 4 : 4 Memory channels # -- : Separator # -c ... : Config file path sudo ./build/dpdk_pkt_proc -l 1-4 -n 4 -- -c conf/config.json ``` ## Architecture Notes - **Threading**: - **Worker Threads**: Pinned to cores. Each polls a specific RX queue, processes packets (Parse -> Flow -> Detect), and enqueues logs to a ring. - **Log Thread**: Pinned to a separate core. Dequeues batches from the ring and writes to ClickHouse. - **Memory Management**: Uses `rte_mempool` for all high-frequency allocations (Packets, Flow Entries, Log Messages) to ensure stable latency and prevent fragmentation.