# CausalFormer **Repository Path**: panjoe/CausalFormer ## Basic Information - **Project Name**: CausalFormer - **Description**: PyTorch Implementation of CausalFormer: An Interpretable Transformer for Temporal Causal Discovery - **Primary Language**: Unknown - **License**: GPL-3.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-02-14 - **Last Updated**: 2026-03-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # PyTorch Implementation of CausalFormer: An Interpretable Transformer for Temporal Causal Discovery Official PyTorch implementation for [CausalFormer: An Interpretable Transformer for Temporal Causal Discovery](https://www.computer.org/csdl/journal/tk/5555/01/10726725/21dMtID3eUw) ([arXiv](https://arxiv.org/abs/2406.16708)). ## Requirements * Python >= 3.5 (3.6 recommended) * PyTorch (tested with PyTorch 1.11.0) * Optional: CUDA (tested with CUDA 11.3) * networkx * numpy * pandas * scikit_learn ## Folder Structure ``` CausalFormer/ ├── base/ - abstract base classes │ ├── base_data_loader.py │ ├── base_model.py │ └── base_trainer.py ├── config/ - holds configuration for training │ ├── config_basic_diamond_mediator.json │ ├── config_basic_v_fork.json │ ├── config_fMRI.json │ └── config_lorenz.json ├── data/ - default directory for storing input data │ ├── basic │ ├── fMRI │ └── lorenz96 ├── data_loader/ ├── evaluator/ ├── experiments.ipynb ├── explainer │ └── explainer.py ├── interpret.py - main script to start interpreting ├── LICENSE ├── logger/ ├── model/ - models, relevance propogation, losses, and metrics │ ├── loss.py │ ├── metric.py │ ├── model.py │ ├── NonParamRP.py │ └── RRP.py ├── parse_config.py ├── README.md ├── requirements.txt ├── runner.py - integrated script to start running CausalFormer ├── saved/ │ ├── models/ - trained models are saved here │ └── log/ - default logdir for tensorboard ├── trainer/ ├── train.py - main script to start training └── utils ``` ## Dataset - Synthetic datasets: [Basic causal structures with additive noise](https://dataverse.harvard.edu/dataverse/basic_causal_structures_additive_noise) - Lorenz96: > Lorenz, Edward N. "Predictability: A problem partly solved." *Proc. Seminar on predictability*. Vol. 1. No. 1. 1996. The Lorenz 96 model is a nonlinear model of climate dynamics as defined below. $$\frac{dx_{t,i}}{dt}=(x_{t,i+1}-x_{t,i-2})x_{t,i-1}-x_{t,i}+F$$ where $x_{t,i}$ is the data of time series $i$ at time slot $t$, and $F$ is a forcing constant that determines the level of non-linearity and chaos in the series. We simulate a Lorenz-96 model with 10 variables and $F\in [ 30,40 ]$ over a time span of 1,000 units. - fMRI: [NetSim](https://www.fmrib.ox.ac.uk/datasets/netsim/index.html) ### Dataset file format - Time Series: The time series file is a CSV containing multiple time series. The first row is the header with the names of the time series. Each column represents a time series. - Groundtruth Causal Graph: The groundtruth causal graph file contains tuples in the form of (i, j, t), where i is the cause, j is the effect, and t is the time lag. ## Usage Try `python runner.py -c config/config_fMRI.json -t demo` to run code. Checking experiments.ipynb for more experiments running. ### Config file format Config files are in `.json` format: ```javascript { "name": "Causality Learning", // training session name "n_gpu": 1, // number of GPUs to use for training. "arch": { "type": "PredictModel", // name of model architecture to train "args": { "d_model": 512, // Dimension of the embedding vector. D_QK in paper "n_head": 8, // Number of attention heads. h in paper "n_layers": 1, // single transformer encoder layer "ffn_hidden": 512, // Hidden dimension in the feed forward layer. d_FFN in paper "drop_prob": 0, // Dropout probability (Not used in practice) "tau": 10 // Temperature hyperparameter for attention softmax } }, "data_loader": { "type": "TimeseriesDataLoader", // selecting data loader "args":{ "data_dir": "data/", // dataset path "batch_size": 64, // batch size "time_step": 32, // input window size. T in paper "output_window": 31, // output window size "feature_dim": 1, // input feature dim "output_dim": 1, // output window size "shuffle": true, // shuffle training data before splitting "validation_split": 0.1 // size of validation dataset. float(portion) or int(number of samples) "num_workers": 2, // number of cpu processes to be used for data loading } }, "optimizer": { "type": "Adam", "args":{ "lr": 0.001, // learning rate "weight_decay": 0, // (optional) weight decay "amsgrad": true } }, "loss": "masked_mse_torch", // loss "metrics": [ "accuracy", "masked_mse_torch" // list of metrics to evaluate ], "lr_scheduler": { "type": "StepLR", // learning rate scheduler "args":{ "step_size": 50, "gamma": 0.1 } }, "trainer": { "epochs": 100, // number of training epochs "save_dir": "saved/", // checkpoints are saved in save_dir/models/name "save_freq": 1, // save checkpoints every save_freq epochs "verbosity": 2, // 0: quiet, 1: per epoch, 2: full "monitor": "min val_loss" // mode and metric for model performance monitoring. set 'off' to disable. "early_stop": 10 // number of epochs to wait before early stop. set 0 to disable. "lam": 5e-4, // the coefficient for normalization "tensorboard": true, // enable tensorboard visualization }, "explainer": { "m":2, // number of top clusters of causal scores to consider. "n":3 // number of total clusters for k-means clustering. } } ``` ## License This project is licensed under the GPL-3.0 License. See LICENSE for more details This project is based on the [pytorch-template](https://github.com/victoresque/pytorch-template) GitHub template. ## Cite ``` @article{kong2024causalformer, title={CausalFormer: An Interpretable Transformer for Temporal Causal Discovery}, author={Kong, Lingbai and Li, Wengen and Yang, Hanchen and Zhang, Yichao and Guan, Jihong and Zhou, Shuigeng}, journal={IEEE Transactions on Knowledge \& Data Engineering}, number={01}, pages={1--14}, year={2024}, publisher={IEEE Computer Society} } ```