# parquet-cli **Repository Path**: apathy0505/parquet-cli ## Basic Information - **Project Name**: parquet-cli - **Description**: No description available - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-04-29 - **Last Updated**: 2025-04-29 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # parquet-cli Command line (CLI) tool to inspect Apache Parquet files on the go Apache Parquet is a columnar storage format commonly used in the Hadoop ecosystem. `parq` is small, easy to install, Python utility to view and get basic information from Parquet files. Current features set are what I need, please use Github issues for any requests/suggestions. ## Install `pip install parquet-cli` An executable script called `parq` will be installed. # Use Once installed, you can use `parq` command. View Parquet file metadata: `$ parq input.parquet` ``` # Metadata created_by: parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) num_columns: 13 num_rows: 1000 num_row_groups: 1 format_version: 1.0 serialized_size: 1125 ``` Get schema information: `$ parq input.parquet --schema` ``` # Schema registration_dttm: INT96 id: INT32 name: BYTE_ARRAY UTF8 email: BYTE_ARRAY UTF8 ... ip_address: BYTE_ARRAY UTF8 country: BYTE_ARRAY UTF8 ``` Get total rows count: `$ parq input.parquet --count` ``` 1025 ``` Get top N records (head) `$ parq input.parquet --head 10` Get bottom N records (tail) `$ parq input.parquet --tail 10` ## Help `$ parq --help` ``` usage: usage: parq file [-s [SCHEMA] | --head [HEAD] | --tail [TAIL] | -c [COUNT]] positional arguments: file Parquet file optional arguments: -h, --help show this help message and exit -s [SCHEMA], --schema [SCHEMA] get schema information --head [HEAD] get first N rows from file --tail [TAIL] get last N rows from file -c [COUNT], --count [COUNT] get total rows count ```