# ifairy

**Repository Path**: yt7589/ifairy

## Basic Information

- **Project Name**: ifairy
- **Description**: Complex Transformer based on https://github.com/PKULab1806/Fairy-plus-minus-i .
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-26
- **Last Updated**: 2025-08-28

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

Fairy±i(also named iFairy)
# Abstract
Fairy±i (iFairy) is the first 2-bit complex-valued large language model, where all weights are constrained to {±1, ±i}. By introducing complex-valued architectures and a novel quantization scheme, iFairy achieves efficient compression with minimal accuracy loss. Experiments show that it consistently outperforms existing 2-bit methods (e.g., BitNet b1.58) and approaches full-precision models on language modeling and reasoning benchmarks.
# Evalation
## Evaluation Results
**Table: Perplexity on WikiText2 and C4 validation sets (lower is better)**

| Size | Model             | WikiText2 | C4    | Avg   |
| :--- | :---------------- | :-------- | :---- | :---- |
| 700M | FP16 LLaMA        | -         | -     | 12.33 |
|      | BitNet b1.58*     | -         | -     | 12.87 |
|      | BitNet b1.58†     | 10.81     | 12.21 | 11.51 |
|      | Fairy ± i°        | 9.41 | 10.75  | 10.08 |
|      | Fairy ± i         | **10.46**     | **11.81** | **11.14** |
| 1.3B | FP16 LLaMA        | -         | -     | 11.25 |
|      | BitNet b1.58*     | -         | -     | 11.29 |
|      | Fairy ± i         | **9.35**      | **10.94** | **10.14** |

\* refers to the reported version in prior work  
† the trained version  
° the full precision Fairy±i  
**Table: Zero-shot Accuracy on Commonsense Reasoning Tasks (%)**
| Model Size | Model          | ARCe  | ARCc  | HS    | BQ    | OQ   | PQ    | WGe   | Avg.  |
| :--------- | :------------- | :---- | :---- | :---- | :---- | :--- | :---- | :---- | :---- |
| 700M       | FP16 LLaMA     | 54.70 | 23.00 | 37.00 | 60.00 | 20.20| 68.90 | 54.80 | 45.51 |
|            | BitNet b1.58 * | 51.80 | 21.40 | 35.10 | 58.20 | 20.00| 68.10 | 55.20 | 44.26 |
|            | BitNet b1.58 † | 51.77 | 22.44 | 35.30 | 58.50 | 20.80| 65.94 | 54.85 | 44.23 |
|            | Fairy±i°       | 55.68 | 24.06 | 37.79 | 60.46 | 20.60| 70.18 | 54.46 | 46.18 |
|            | **Fairy±i**        | **53.45** | **23.04** | **36.04** | **55.31** | **21.00**| **68.01** | **54.06** | **44.70** |
| 1.3B   | FP16 LLaMA | 56.90 | 23.50 | 38.50 | 59.10 | 21.60| 70.00 | 53.90 | 46.21 |
|            | BitNet b1.58 * | 54.90 | 24.20 | 37.70 | 56.70 | 19.60| 68.80 | 55.80 | 45.39 |
|            | **Fairy±i**    | **56.65** | **24.66** | **38.69** | **59.60** | **22.20**| **69.80** | **54.06** | **46.52** |

\* reported in prior work 
† trained version  
° full precision Fairy±i
## Task Evaluation

You can evaluate your model on specific tasks by running the following command from the project root:

```bash
python eval/eval_task.py \
  --seed 42 \
  --hf_path /path/to/model_or_repo \
  --batch_size 8 \
  --device cuda:0 \
  --tasks TASK \
  --num_fewshot 0 \
  --ctx_size 2048
```
- /path/to/model_or_repo is the path to your model directory, usually generated by model.save_pretrained().

- TASK is a comma-separated list of evaluation tasks. For example:
arc_easy,arc_challenge,hellaswag,boolq,openbookqa,piqa,winogrande

- ctx_size specifies the maximum context length (sequence length) used during evaluation.
## PPL Evaluation
To evaluate your model’s perplexity, run the following command from the root directory:
```bash
python eval/eval_ppl.py \
  --seed 42 \
  --hf_path /path/to/model_or_repo \
  --seqlen 2048 \
  --device cuda:0
```
- /path/to/model_or_repo is the path to your saved model checkpoint.

- seqlen defines the maximum sequence length for evaluation.

- seed is the random seed to ensure reproducibility.

# Introduction

The advent of Large Language Models (LLMs) has transformed artificial
intelligence, achieving remarkable performance across a wide range of
natural language
tasks . However,
this success is built upon massive model sizes, often reaching billions
or trillions of parameters, which poses serious deployment challenges
due to immense memory footprints and high computational
costs . To
democratize access to these powerful models, model compression has
become a critical research area, with *quantization* emerging as a
leading technique. Quantization methods are broadly categorized into
Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
While PTQ  offers simplicity, its
performance often degrades sharply in extremely low-bit scenarios due to
the model's lack of adaptation to quantized representations. In
contrast, QAT integrates quantization into the training loop, allowing
models to learn robust low-bit representations and maintain performance
under aggressive compression. This advantage has motivated recent
research into QAT-based strategies tailored for LLMs.

The pursuit of extremely low-bit quantization, particularly 2-bit
quantization, has become a focal point in efforts to compress Large LLMs
for efficient deployment. Existing approaches, such as
BitNet  and its successors , have
demonstrated that it is possible to retain reasonable accuracy using
ternary quantization schemes with just 1.58 bits per weight. However,
the accuracy of any quantized model is fundamentally limited by the
following equation:
$$\textbf{Accuracy}_\text{quant}=\textbf{Accuracy}_\text{full-precision}-\textbf{Error}_\text{quant}$$
All current quantization research focuses on minimizing quantization
error on full-precision models (e.g., LLaMA), but the quantization error
can never be zero. Therefore, full-precision accuracy becomes the
**ceiling** for quantized accuracy. To date, no existing method has even
attempted to surpass this ceiling.

In this paper, we propose a fundamentally different perspective. Instead
of solely focusing on reducing quantization error, we make the first
attempt to raise the ceiling (the accuracy of the full-precision model),
while still ensuring that the resulting model can be efficiently
quantized to a 2-bit format. Our key insight is that if the
full-precision model becomes more expressive and accurate, the final
2-bit quantized model can achieve higher accuracy as well. Building on
this insight, we propose, for the first time, incorporating
complex-valued neural architectures into LLMs. The complex number
provides a richer representational space with additional phase
information, thereby enhancing the expressiveness of linear
transformations without increasing the parameter count. By
systematically extending the Transformer architecture into the complex
domain, we construct a full-precision complex-valued LLM with superior
modeling capacity.

Building upon this complex-valued foundation, we further design a novel
2-bit quantization scheme tailored for complex weights. Specifically, we
quantize each complex parameter to one of the **fourth roots of unity**
\{ ± 1, ± i\} in the complex plane. This approach---unlike
real-valued quantization---exploits the full 2-bit representational
capacity *without sacrificing symmetry or sparsity*, thereby eliminating
the trade-offs that limit real-valued schemes. The resulting model,
which we name , is perfectly storage-efficient and phase-aware by
design. We propose a quantization function that learns to project
full-precision complex weights onto the target set \{± 1, ± i\}
while preserving both magnitude and phase information. We implement this
within our complex Transformer framework and evaluate its performance
under the same storage and compute constraints as BitNet b1.58.
Experiments show that significantly improves perplexity and downstream
task accuracy, outperforming existing 2-bit baselines and approaching
the performance of full-precision FP16 models.

Our contributions can be summarized as follows:

-   We propose a new perspective on low-bit quantization: improving the
    accuracy of quantized models by raising the ceiling (the full
    precision model).

-   We design a complex-valued LLM architecture that leverages the
    representational benefits of the complex domain without increasing
    parameter storage.

-   We design a 2-bit quantization scheme that maps complex weights to
    the 4th roots of unity \{± 1, ± i\}, fully utilizing bit
    capacity while preserving key properties like symmetry and sparsity.

-   Experimental results show that our quantized model outperforms the
    ceiling of existing 2-bit quantization approaches in terms of both
    PPL and downstream understanding tasks.

# Paper
This work is described in detail in our paper:  
**iFairy: the First 2-bit Complex LLM with All Parameters in {±1, ±i}**  
[[arXiv](https://arxiv.org/abs/2508.05571)]

# Train
To start distributed training from the project root, run:

```bash
accelerate launch \
  --config-file train/complexnet_config.yaml \
  --num_processes N \
  train/train.py \
  --dataset_path DATAPATH
```
- train/complexnet_config.yaml — The training configuration file.
- N — Number of processes to launch for training.
- DATAPATH — Path to your dataset.

By default, train/train.py uses datasets.load_from_disk() to load the dataset.
If you are using a different dataset format, modify the dataset loading logic in train.py accordingly.

For larger-scale training, you can adjust additional accelerate parameters in the command to fit your hardware and performance requirements.

# Links

## HuggingFace
- [Fairy±i-700M on HuggingFace](https://huggingface.co/PKU-DS-LAB/Fairy-plus-minus-i-700M)
- [Fairy±i-1.3B on HuggingFace](https://huggingface.co/PKU-DS-LAB/Fairy-plus-minus-i-1.3B)

## ModelScope
- [Fairy±i-700M on ModelScope](https://modelscope.cn/models/PKUDSLAB1806/Fairy-plus-minus-i-700M)
- [Fairy±i-1.3B on ModelScope](https://modelscope.cn/models/PKUDSLAB1806/Fairy-plus-minus-i-1.3B)

## Paper
- [iFairy: the First 2-bit Complex LLM with All Parameters in {±1, ±i}](https://arxiv.org/abs/2508.05571)