# orthrus_code

**Repository Path**: chen_minghao_2014/orthrus_code

## Basic Information

- **Project Name**: orthrus_code
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-22
- **Last Updated**: 2025-10-22

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

[![DOI](https://zenodo.org/badge/852328574.svg)](https://doi.org/10.5281/zenodo.14641605)

# ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion Detection Systems

This repo contains the official code of the [Orthrus paper](https://www.usenix.org/system/files/conference/usenixsecurity25/sec25cycle1-prepub-103-jiang-baoxiang.pdf).

## Citing our work

```
@inproceedings{jian2025,
	title={{ORTHRUS: Achieving High Quality of Attribution in Provenance-based Intrusion
	Detection Systems}},
	author={Jiang, Baoxiang and Bilot, Tristan  and El Madhoun, Nour and Al Agha, Khaldoun  and Zouaoui, Anis and Iqbal, Shahrear and Han, Xueyuan and Pasquier, Thomas},
	booktitle={Security Symposium (USENIX Sec'25)},
	year={2025},
	organization={USENIX}
}
```

## Updates

[2025.06.06] Orthrus is now available in [PIDSMaker](https://github.com/ubc-provenance/PIDSMaker)!

[2025.06.05] Orthrus' weights are available.

[2025.06.04] Installation guidelines are now simplified. The DARPA TC databases can be directly downloaded and installed locally. No need to fill them locally anymore.

## Setup

### Clone the repo with submodules
```
git clone --recurse-submodules https://github.com/ubc-provenance/orthrus.git
```

### 10-min install of Docker and Datasets

We have made the installation of DARPA TC/OpTC easy and fast, simply follow [these guidelines](https://github.com/ubc-provenance/PIDSMaker/blob/velox/settings/ten-minute-install.md).

## Run experiments

The following commands should be executed within the `pids` container.

### Reproduce results from the paper

Launching Orthrus is as simple as running:

```shell
python src/orthrus.py [dataset] [config args...]
```

Running `orthrus.py` will run by default the `graph_construction`, `edge_featurization`, `detection` and `attack_reconstruction` tasks configured within the `config/orthrus.yml` file. This configuration can be updated directly in the YML file or from the CLI, as shown above.

> [!NOTE]
> The original results could not be exactly replicated due to a missing PYTHONHASHSEED affecting Gensim's Word2Vec, though the following experiments yield similar results in most cases.

#### Expected results
| Name             | TP  | FP  | TN       | FN  | Precision | MCC       |
|------------------|-----|-----|----------|-----|-----------|-----------|
| CADETS_E3_full  | 22  | 10  | 268,075   | 46  | 0.69   | 0.47   |
| CADETS_E3_ano   | 15   | 0   | 268,085   | 53  | 1.00   | 0.47   |
| THEIA_E3_full  | 22  | 0  | 699,177   | 96  | 1.00   | 0.43   |
| THEIA_E3_ano    | 2   | 0   | 699,177   | 116 | 1.00   | 0.13   |
| CADETS_E5_full  | 3   | 1318  | 3,132,823  | 120 | 0.00   | 0.01   |
| CADETS_E5_ano   | 1   | 2   | 3,134,139  | 122 | 0.33   | 0.05   |
| THEIA_E5_full  | 13  | 2   | 747,381   | 56  | 0.86   | 0.40   |
| THEIA_E5_ano    | 2   | 0   | 747,383   | 67  | 1.00   | 0.17   |
| CLEARSCOPE_E3_full  | 1   | 647   | 110,715   | 40 | 0.00  | 0.00 |
| CLEARSCOPE_E3_ano | 1 | 5 | 111,357 | 40  | 0.17  | 0.06  |
| CLEARSCOPE_E5_full  | 4  | 8   | 150,666 | 47  | 0.33   | 0.16   |
| CLEARSCOPE_E5_ano | 2   | 5   | 150,669 | 49  | 0.29   | 0.10   |


#### Experiments

These experiments use pre-trained weights of Orthrus.

**CADETS_E3**
```
PYTHONHASHSEED=0 python src/orthrus.py CADETS_E3 --from_weights --detection.gnn_training.encoder.graph_attention.dropout=0.25 --detection.gnn_training.node_hid_dim=256 --detection.gnn_training.node_out_dim=256 --detection.gnn_training.lr=0.001 --detection.gnn_training.num_epochs=20 --seed=4
```

**THEIA_E3**
```
PYTHONHASHSEED=0 python src/orthrus.py THEIA_E3 --from_weights --detection.gnn_training.encoder.graph_attention.dropout=0.1 --seed=2
```

**CLEARSCOPE_E3**
```
PYTHONHASHSEED=0 python src/orthrus.py CLEARSCOPE_E3 --from_weights --graph_construction.build_graphs.time_window_size=1.0 --detection.gnn_training.encoder.graph_attention.dropout=0.1 --seed=2
```

**CADETS_E5**
```
PYTHONHASHSEED=0 python src/orthrus.py CADETS_E5 --from_weights --detection.gnn_training.node_out_dim=128 --detection.gnn_training.lr=0.0001 --detection.gnn_training.encoder.graph_attention.dropout=0.1 --graph_construction.build_graphs.time_window_size=1.0
```

**THEIA_E5**
```
PYTHONHASHSEED=0 python src/orthrus.py THEIA_E5 --from_weights
```

**CLEARSCOPE_E5**
```
PYTHONHASHSEED=0 python src/orthrus.py CLEARSCOPE_E5 --from_weights --detection.gnn_training.lr=0.0001 --detection.gnn_training.encoder.graph_attention.dropout=0.1 --detection.gnn_training.node_out_dim=64
```

### Subsequent runs

When run once, datasets are preprocessed and stored in the `ROOT_ARTIFACT_DIR` path within `config.py`. There is thus no need to recompute them. To avoid re-computing the `graph_construction` and `edge_featurization` tasks, Orthrus can be run directly from the `detection` task using the arg `--run_from_training`.

```shell
python src/orthrus.py CADETS_E3 --run_from_training
```

### Weights & Biases interface

W&B is used as the default interface to visualize and historize experiments. First log into your account from the CLI using:

```shell
wandb login
```

Set your API key, which can be found on the website. Then you can push the logs and results of experiments to the interface using the `--wandb` arg.
The preferred solution is to run the `run.sh` script, which directly logs the experiments to the W&B interface.

```shell
python src/orthrus.py THEIA_E3 --wandb
```

## License

See [licence](LICENSE).