# pipascope **Repository Path**: bernard5/pipascope ## Basic Information - **Project Name**: pipascope - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-03 - **Last Updated**: 2025-09-08 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PipaScope: A Dataset for CPU Microarchitecture Performance Characterization > **PipaScope** – Observe the pulse of performance, one cycle at a time. > An open dataset initiative for microarchitectural behavior analysis, led by **ZJU-SPAIL**. [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE) [![Git LFS](https://img.shields.io/badge/Git%20LFS-enabled-ff69b4.svg)](https://git-lfs.com) ## 🍈 About the Name: PIPA & PipaScope **PIPA (Progressive Intelligent Performance Analytics)** draws inspiration from *loquat* (枇杷), a fruit native to Zhejiang, China. Its lifecycle—tree (collecting), flower (analysis), and fruit (conclusion)—mirrors the performance engineering pipeline. **PipaScope** extends this metaphor as the **observational lens** into the microarchitectural world. Just as the loquat tree absorbs nutrients from the soil, PipaScope captures low-level performance telemetry from real workloads, enabling deep insight into CPU behavior. This dataset serves as the foundational **"soil"** for training automated performance diagnosis systems. ## 🏫 Project Ownership PipaScope is currently led and maintained by the System Performance Analytics and Intelligence Lab (ZJU-SPAIL) at Zhejiang University. It is part of ongoing research into systematic performance characterization and bottleneck analysis. ## 🎯 Focus: Microarchitectural Behavior PipaScope is designed to support research in **CPU microarchitecture performance characterization**, with a focus on: - Instruction per Cycle (IPC) degradation - Cache miss patterns (L1/L2/LLC) - Memory bandwidth saturation - Frontend/backend stalls - Branch misprediction penalties - TLB pressure The goal is to build a high-quality, version-controlled dataset that enables reproducible analysis and lays the foundation for automated bottleneck identification. ## 🧩 Data Sources The dataset includes performance profiles from: - **SPEC CPU 2017** (both integer and floating-point benchmarks) - **Real-world applications**, starting with **RocksDB** Each workload is executed under diverse configurations (input sets, system settings, compiler flags) and on multiple hardware platforms (Intel/Arm) to capture a wide range of microarchitectural behaviors. ## 🛠️ Data Collection All data is collected using standardized tools and methodologies: - **perf** (Linux Performance Events) for hardware counter sampling - Custom **run scripts** for SPEC CPU 2017 and real-world applications - Metric derivation based on **PIPA-SHU** principles (multiplexing-aware counter aggregation) All data is versioned using **Git LFS** to support large file storage and traceability. ## 📌 Status This project is in the **early development phase**. The dataset is actively being built by ZJU-SPAIL members. Public access is read-only; contributions are not currently accepted. Documentation and tooling will be expanded as the dataset matures. --- > *“PipaScope: where data grows like fruit, and insight blossoms from observation.”*