# tiny-cuda-nn

**Repository Path**: Fj1225815367/tiny-cuda-nn

## Basic Information

- **Project Name**: tiny-cuda-nn
- **Description**: Lightning fast C++/CUDA neural network framework
- **Primary Language**: Unknown
- **License**: BSD-3-Clause
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 1
- **Created**: 2024-03-13
- **Last Updated**: 2024-05-31

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# Tiny CUDA神经网络 ![](https://github.com/NVlabs/tiny-cuda-nn/workflows/CI/badge.svg)

这是一个小型的、自包含的神经网络训练和查询框架。最值得注意的是，它包含了一个高速的 ["完全融合" 多层感知器](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/fully-fused-mlp-diagram.png)（[技术论文](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.pdf)），一个多功能的 [多分辨率哈希编码](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/multiresolution-hash-encoding-diagram.png)（[技术论文](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf)），以及对各种其他输入编码、损失和优化器的支持。

## 性能

![Image](data/readme/fully-fused-vs-tensorflow.png)
_完全融合网络 vs. TensorFlow v2.5.0 w/ XLA。在RTX 3090上，使用64（实线）和128（虚线）神经元宽的多层感知器进行测量。由`benchmarks/bench_ours.cu`和`benchmarks/bench_tensorflow.py`生成，使用`data/config_oneblob.json`。_


## 使用

Tiny CUDA神经网络具有简单的C++/CUDA API：

```cpp
#include <tiny-cuda-nn/common.h>

// 配置模型
nlohmann::json config = {
	{"loss", {
		{"otype", "L2"}
	}},
	{"optimizer", {
		{"otype", "Adam"},
		{"learning_rate", 1e-3},
	}},
	{"encoding", {
		{"otype", "HashGrid"},
		{"n_levels", 16},
		{"n_features_per_level", 2},
		{"log2_hashmap_size", 19},
		{"base_resolution", 16},
		{"per_level_scale", 2.0},
	}},
	{"network", {
		{"otype", "FullyFusedMLP"},
		{"activation", "ReLU"},
		{"output_activation", "None"},
		{"n_neurons", 64},
		{"n_hidden_layers", 2},
	}},
};

using namespace tcnn;

auto model = create_from_config(n_input_dims, n_output_dims, config);

// 训练模型（batch_size 必须是 tcnn::BATCH_SIZE_GRANULARITY 的倍数）
GPUMatrix<float> training_batch_inputs(n_input_dims, batch_size);
GPUMatrix<float> training_batch_targets(n_output_dims, batch_size);

for (int i = 0; i < n_training_steps; ++i) {
	generate_training_batch(&training_batch_inputs, &training_batch_targets); // <-- your code

	float loss;
	model.trainer->training_step(training_batch_inputs, training_batch_targets, &loss);
	std::cout << "iteration=" << i << " loss=" << loss << std::endl;
}

// 使用模型
GPUMatrix<float> inference_inputs(n_input_dims, batch_size);
generate_inputs(&inference_inputs); // <-- your code

GPUMatrix<float> inference_outputs(n_output_dims, batch_size);
model.network->inference(inference_inputs, inference_outputs);
```


## 示例：学习一个二维图像

我们提供了一个示例应用程序，用于学习一个图像函数 _(x,y) -> (R,G,B)_。你可以通过以下方式运行:
```sh
tiny-cuda-nn$ ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json
```
每隔几个训练步骤生成一幅图像。每1000步的训练时间在默认配置下应该在1秒左右，在RTX 4090上。

| 10 steps | 100 steps | 1000 steps | Reference image |
|:---:|:---:|:---:|:---:|
| ![10steps](data/readme/10.jpg) | ![100steps](data/readme/100.jpg) | ![1000steps](data/readme/1000.jpg) | ![reference](data/images/albert.jpg) |


## 要求

- 一个 __NVIDIA GPU__；如果可用，张量核心会提高性能。所有展示的结果都来自于一台 RTX 3090。
- 一个支持 __C++14__ 的编译器。推荐以下选择，并经过了测试：
  - __Windows:__ Visual Studio 2019 或者 2022
  - __Linux:__ GCC/G++ 8 或者更高
- 需要一个最近版本的 [CUDA](https://developer.nvidia.com/cuda-toolkit)。推荐以下选择，并经过了测试：
  - __Windows:__ CUDA 11.5或者更高
  - __Linux:__ CUDA 10.2或者更高
- __[CMake](https://cmake.org/) v3.21或者更高__.
- 这个框架的完全融合MLP组件在默认配置中需要 __非常大__ 量的共享内存。它可能只能在RTX 3090、RTX 2080 Ti或更高端的GPU上工作。低端的显卡必须减少`n_neurons`参数或者使用`CutlassMLP`（兼容性更好但速度较慢）代替。

如果你正在使用 Linux，安装以下软件包:
```sh
sudo apt-get install build-essential git
```

我们还建议在 `/usr/local/` 中安装 [CUDA](https://developer.nvidia.com/cuda-toolkit)，并将 CUDA 安装添加到你的 PATH 中。
例如，如果你有 CUDA 11.4，请将以下内容添加到你的 `~/.bashrc` 文件中:
```sh
export PATH="/usr/local/cuda-11.4/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH"
```


## 编译（Windows 和 Linux）

请使用以下命令开始克隆此存储库及其所有子模块：
```sh
$ git clone --recursive https://github.com/nvlabs/tiny-cuda-nn
$ cd tiny-cuda-nn
```
然后，使用 CMake 构建项目：（在 Windows 上，这必须在[开发人员命令提示符](https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-160#developer_command_prompt)中执行）
```sh
tiny-cuda-nn$ cmake . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
tiny-cuda-nn$ cmake --build build --config RelWithDebInfo -j
```

如果编译在一个小时内无法完成，或者失败原因不明，可能是因为内存不足。在这种情况下，尝试不使用 `-j` 选项运行上述命令。


## PyTorch 扩展

__tiny-cuda-nn__ 带有一个[PyTorch](https://github.com/pytorch/pytorch)扩展，可以在[Python](https://www.python.org/)环境中使用快速的MLP和输入编码。
这些绑定可以比完整的Python实现快得多; 特别是对于[多分辨率哈希编码](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/multiresolution-hash-encoding-diagram.png)。

> 如果批处理大小较小，Python/PyTorch的开销仍然可能很大。
> 例如，使用批处理大小为64k，捆绑的`mlp_learning_an_image`示例通过PyTorch比本机CUDA慢大约 __~2倍__。
> 使用批处理大小为256k及更高（默认值），性能更接近。

首先设置一个带有最新的支持CUDA版本的Python 3.X环境，并调用
```sh
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
```

或者，如果您想从 __tiny-cuda-nn__ 的本地克隆中安装，可以调用
```sh
tiny-cuda-nn$ cd bindings/torch
tiny-cuda-nn/bindings/torch$ python setup.py install
```

成功后，您可以像以下示例一样使用 __tiny-cuda-nn__ 模型：
```py
import commentjson as json
import tinycudann as tcnn
import torch

with open("data/config_hash.json") as f:
	config = json.load(f)

# 选项1：高效的编码+网络组合。
model = tcnn.NetworkWithInputEncoding(
	n_input_dims, n_output_dims,
	config["encoding"], config["network"]
)

# 选项2：单独的模块。速度较慢，但更灵活。
encoding = tcnn.Encoding(n_input_dims, config["encoding"])
network = tcnn.Network(encoding.n_output_dims, n_output_dims, config["network"])
model = torch.nn.Sequential(encoding, network)
```

请参阅`samples/mlp_learning_an_image_pytorch.py`查看示例。


## 组件

以下是此框架组件的摘要。[JSON 文档](DOCUMENTATION.md)列出了配置选项。


| 网络 | &nbsp; | &nbsp;
| :--- | :---------- | :-----
| Fully fused MLP | `src/fully_fused_mlp.cu` | 小型多层感知器（MLP）的高速实现。
| CUTLASS MLP     | `src/cutlass_mlp.cu`     | 基于 [CUTLASS](https://github.com/NVIDIA/cutlass) 的 GEMM (general matrix multiply) 例程的MLP。比完全融合的速度慢，但处理更大的网络仍然相当快。

| 输入编码 | &nbsp; | &nbsp;
| :--- | :---------- | :-----
| Composite | `include/tiny-cuda-nn/encodings/composite.h` | 允许组合多个编码。例如，可以用于组装神经辐射缓存编码[[Müller et al. 2021]](https://tom94.net/)。
| Frequency | `include/tiny-cuda-nn/encodings/frequency.h` | NeRF的[Mildenhall等人2020](https://www.matthewtancik.com/nerf)中的位置编码等同应用于所有维度。
| Grid | `include/tiny-cuda-nn/encodings/grid.h` | 基于可训练的多分辨率网格的编码。用于[即时神经图形基元[Müller等人2022]](https://nvlabs.github.io/instant-ngp/)。这些网格可以由哈希表、稠密存储或平铺存储支持。
| Identity | `include/tiny-cuda-nn/encodings/identity.h` | 将值保持不变。
| Oneblob | `include/tiny-cuda-nn/encodings/oneblob.h` | 来自神经重要性采样[[Müller等人，2019]](https://tom94.net/data/publications/mueller18neural/mueller18neural-v4.pdf)和神经控制变量[[Müller等人，2020]](https://tom94.net/data/publications/mueller20neural/mueller20neural.pdf)。
| SphericalHarmonics | `include/tiny-cuda-nn/encodings/spherical_harmonics.h` | 一种频率空间编码，比分量方式更适合于方向向量。
| TriangleWave | `include/tiny-cuda-nn/encodings/triangle_wave.h` | 低成本替代NeRF的编码。在神经辐射缓存[[Müller et al. 2021]](https://tom94.net/)中使用。

| 损失 | &nbsp; | &nbsp;
| :--- | :---------- | :-----
| L1 | `include/tiny-cuda-nn/losses/l1.h` | 标准的L1损失。
| Relative L1 | `include/tiny-cuda-nn/losses/l1.h` | 相对L1损失，通过网络预测进行归一化。
| MAPE | `include/tiny-cuda-nn/losses/mape.h` | 平均绝对百分比误差（MAPE）。与相对L1相同，但是归一化为目标。
| SMAPE | `include/tiny-cuda-nn/losses/smape.h` | 对称平均绝对百分比误差（SMAPE）。与相对L1相同，但是归一化为预测值和目标值的平均值。
| L2 | `include/tiny-cuda-nn/losses/l2.h` | 标准的 L2 损失。
| Relative L2 | `include/tiny-cuda-nn/losses/relative_l2.h` | 由网络预测[[Lehtinen et al. 2018]](https://github.com/NVlabs/noise2noise)归一化的相对 L2 损失。
| Relative L2 Luminance | `include/tiny-cuda-nn/losses/relative_l2_luminance.h` | 与上述相同，但是通过网络预测值的亮度进行归一化。仅在网络预测是 RGB 时适用。在《神经辐射缓存》[[Müller et al. 2021]](https://tom94.net/)中使用。
| Cross Entropy | `include/tiny-cuda-nn/losses/cross_entropy.h` | 标准的交叉熵损失。仅在网络预测是概率密度函数时适用。
| Variance | `include/tiny-cuda-nn/losses/variance_is.h` | 标准方差损失。仅在网络预测是概率密度函数时适用。

| 优化器 | &nbsp; | &nbsp;
| :--- | :---------- | :-----
| Adam | `include/tiny-cuda-nn/optimizers/adam.h` | Adam的实现 [[Kingma and Ba 2014]](https://arxiv.org/abs/1412.6980), 通用化为AdaBound [[Luo et al. 2019]](https://github.com/Luolc/AdaBound).
| Novograd | `include/tiny-cuda-nn/optimizers/lookahead.h` | Novograd的实现 [[Ginsburg et al. 2019]](https://arxiv.org/abs/1905.11286).
| SGD | `include/tiny-cuda-nn/optimizers/sgd.h` | 标准随机梯度下降（SGD）。
| Shampoo | `include/tiny-cuda-nn/optimizers/shampoo.h` |使用家庭优化和[Anil等人（2020）](https://arxiv.org/abs/2002.09018)的优化实现的第二阶Shampoo优化器[[Gupta et al. 2018]](https://arxiv.org/abs/1802.09568)。
| Average | `include/tiny-cuda-nn/optimizers/average.h` | 包装另一个优化器，并在最后N次迭代中计算权重的线性平均值。平均值仅用于推理（不会反馈到训练中）。
| Batched | `include/tiny-cuda-nn/optimizers/batched.h` | 包装另一个优化器，在平均梯度上每N步调用一次嵌套优化器。具有增加批量大小的相同效果，但只需要恒定的内存量。|
| Composite | `include/tiny-cuda-nn/optimizers/composite.h` | 允许在不同参数上使用多个优化器。
| EMA | `include/tiny-cuda-nn/optimizers/average.h` | 包装另一个优化器，并计算权重的指数移动平均值。平均值仅用于推断（不会反馈到训练中）。
| Exponential Decay | `include/tiny-cuda-nn/optimizers/exponential_decay.h` | 包装另一个优化器，并执行分段常数指数学习率衰减。
| Lookahead | `include/tiny-cuda-nn/optimizers/lookahead.h` | 包装另一个优化器，实现前瞻算法。 [[Zhang et al. 2019]](https://arxiv.org/abs/1907.08610).


## 许可和引用

该框架使用 BSD 3-Clause 许可证。详情请参阅 `LICENSE.txt`。

如果您在研究中使用了它，我们将感激您通过引用来表示:
```bibtex
@software{tiny-cuda-nn,
	author = {M\"uller, Thomas},
	license = {BSD-3-Clause},
	month = {4},
	title = {{tiny-cuda-nn}},
	url = {https://github.com/NVlabs/tiny-cuda-nn},
	version = {1.7},
	year = {2021}
}
```

对于商业咨询，请访问我们的网站并提交表单：[NVIDIA研究许可](https://www.nvidia.com/en-us/research/inquiries/)


## 出版物和软件

除其他外，这个框架还支持以下出版物:

> __具有多分辨率哈希编码的即时神经图形基元(Instant Neural Graphics Primitives with a Multiresolution Hash Encoding)__  
> [Thomas Müller](https://tom94.net), [Alex Evans](https://research.nvidia.com/person/alex-evans), [Christoph Schied](https://research.nvidia.com/person/christoph-schied), [Alexander Keller](https://research.nvidia.com/person/alex-keller)  
> _ACM Transactions on Graphics (__SIGGRAPH__), July 2022_  
> __[Website](https://nvlabs.github.io/instant-ngp/)&nbsp;/ [Paper](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf)&nbsp;/ [Code](https://github.com/NVlabs/instant-ngp)&nbsp;/ [Video](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.mp4)&nbsp;/ [BibTeX](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.bib)__

> __从图像中提取三角形3D模型、材质和光照(Extracting Triangular 3D Models, Materials, and Lighting From Images)__  
> [Jacob Munkberg](https://research.nvidia.com/person/jacob-munkberg), [Jon Hasselgren](https://research.nvidia.com/person/jon-hasselgren), [Tianchang Shen](http://www.cs.toronto.edu/~shenti11/), [Jun Gao](http://www.cs.toronto.edu/~jungao/), [Wenzheng Chen](http://www.cs.toronto.edu/~wenzheng/), [Alex Evans](https://research.nvidia.com/person/alex-evans), [Thomas Müller](https://tom94.net), [Sanja Fidler](https://www.cs.toronto.edu/~fidler/)  
> __CVPR (Oral)__, June 2022  
> __[Website](https://nvlabs.github.io/nvdiffrec/)&nbsp;/ [Paper](https://nvlabs.github.io/nvdiffrec/assets/paper.pdf)&nbsp;/ [Video](https://nvlabs.github.io/nvdiffrec/assets/video.mp4)&nbsp;/ [BibTeX](https://nvlabs.github.io/nvdiffrec/assets/bib.txt)__

> __用于路径追踪的实时神经辐射缓存(Real-time Neural Radiance Caching for Path Tracing)__  
> [Thomas Müller](https://tom94.net), [Fabrice Rousselle](https://research.nvidia.com/person/fabrice-rousselle), [Jan Novák](http://jannovak.info), [Alexander Keller](https://research.nvidia.com/person/alex-keller)  
> _ACM Transactions on Graphics (__SIGGRAPH__), August 2021_  
> __[Paper](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.pdf)&nbsp;/ [GTC talk](https://gtc21.event.nvidia.com/media/Fully%20Fused%20Neural%20Network%20for%20Radiance%20Caching%20in%20Real%20Time%20Rendering%20%5BE31307%5D/1_liqy6k1c)&nbsp;/ [Video](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.mp4)&nbsp;/ [Interactive results viewer](https://tom94.net/data/publications/mueller21realtime/interactive-viewer/)&nbsp;/ [BibTeX](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.bib)__


还包括以下软件:

> __NerfAcc: 一个通用的NeRF加速工具箱__  
> [Ruilong Li](https://www.liruilong.cn/), [Matthew Tancik](https://www.matthewtancik.com/about-me), [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/)  
> __https://github.com/KAIR-BAIR/nerfacc__

> __Nerfstudio: 一个用于神经辐射场开发的框架__  
> [Matthew Tancik*](https://www.matthewtancik.com/about-me), [Ethan Weber*](https://ethanweber.me/), [Evonne Ng*](http://people.eecs.berkeley.edu/~evonne_ng/), [Ruilong Li](https://www.liruilong.cn/), Brent Yi, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/)  
> __https://github.com/nerfstudio-project/nerfstudio__

如果您的出版物或软件未列出，请随时提出拉取请求。

## 致谢

特别感谢 NRC 的作者们进行了有益的讨论，以及感谢 [Nikolaus Binder](https://research.nvidia.com/person/nikolaus-binder) 提供了这个框架的部分基础设施，以及在 CUDA 中利用张量核心方面提供的帮助。