# dct **Repository Path**: fsfzp888_admin/dct ## Basic Information - **Project Name**: dct - **Description**: 二维离散余弦正弦变换 支持2D DCT/DST/DSCT/DCST openmp/cuda并行 - **Primary Language**: C++ - **License**: BSD-3-Clause - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2022-11-04 - **Last Updated**: 2024-05-31 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # dct #### 介绍 离散余弦正弦变换 #### 使用说明 参考main.cpp的用法,比如进行二维的DCT或者DST变换: ``` c++ unsigned int dim = 4096; std::vector data(dim * dim); for (unsigned int r = 0; r < dim; ++r) { for (unsigned int c = 0; c < dim; ++c) { data[r * dim + c] = (float)rand() / (RAND_MAX - 1); } } auto data2 = data; dct::transpose(data.data(), data2.data(), dim, dim); for (unsigned int r = 0; r < dim; ++r) { for (unsigned int c = 0; c < dim; ++c) { assert(data[r * dim + c] == data2[c * dim + r]); } } std::vector cos_dct(dim), cos_idct(dim); dct::precompute_dct_cos(cos_dct.data(), dim); dct::precompute_idct_cos(cos_idct.data(), dim); std::vector out1(dim * dim), out2(dim * dim), buf(dim * dim); clock_t start_time, end_time; start_time = clock(); dct::dct2d(data.data(), out1.data(), buf.data(), cos_dct.data(), cos_dct.data(), dim, dim, 1); end_time = clock(); std::cout << "dct2d time: " << (double)(end_time - start_time) / CLOCKS_PER_SEC << std::endl; start_time = clock(); dct::idct2d(out1.data(), out2.data(), buf.data(), cos_idct.data(), cos_idct.data(), dim, dim, 1); end_time = clock(); std::cout << "idct2d time: " << (double)(end_time - start_time) / CLOCKS_PER_SEC << std::endl; double sum1 = 0.0, sum2 = 0.0; for (unsigned int r = 0; r < dim; ++r) { for (unsigned int c = 0; c < dim; ++c) { double dr = (data[r * dim + c] - out2[r * dim + c]); sum1 += dr * dr; sum2 += data[r * dim + c] * out2[r * dim + c]; } } std::cout << std::sqrt(sum1 / sum2) << std::endl; start_time = clock(); dct::dst2d(data.data(), out1.data(), buf.data(), cos_dct.data(), cos_dct.data(), dim, dim, 1); end_time = clock(); std::cout << "dst2d time: " << (double)(end_time - start_time) / CLOCKS_PER_SEC << std::endl; start_time = clock(); dct::idst2d(out1.data(), out2.data(), buf.data(), cos_idct.data(), cos_idct.data(), dim, dim, 1); end_time = clock(); std::cout << "idst2d time: " << (double)(end_time - start_time) / CLOCKS_PER_SEC << std::endl; sum1 = 0.0; sum2 = 0.0; for (unsigned int r = 0; r < dim; ++r) { for (unsigned int c = 0; c < dim; ++c) { double dr = (data[r * dim + c] - out2[r * dim + c]); sum1 += dr * dr; sum2 += data[r * dim + c] * out2[r * dim + c]; } } std::cout << std::sqrt(sum1 / sum2) << std::endl; ``` 也支持二维的DSCT和DCST变换 ## 对比GPU运行速度 GPU: RTX 4090 结果 ``` txt dim is 4096x4096 CPU dct2d time: 0.946874 CPU idct2d time: 0.861657 DCT/IDCT error:0.000355373 CPU dst2d time: 1.01149 CPU idst2d time: 0.914253 DST/IDST error:0.000946313 GPU Device 0 with compute capability 8.9 transpose error: 0 GPU dct2d time: 0.028985 GPU idct2d time: 0.027435 DCT/IDCT error: 0.000251305 GPU dst2d time: 0.02841 GPU idst2d time: 0.027059 DST/IDST error: 0.000355395 dim is 2048x2048 CPU dct2d time: 0.225318 CPU idct2d time: 0.194414 DCT/IDCT error:0.000225355 CPU dst2d time: 0.247617 CPU idst2d time: 0.216328 DST/IDST error:0.000594362 GPU Device 0 with compute capability 8.9 transpose error: 0 GPU dct2d time: 0.006581 GPU idct2d time: 0.005803 DCT/IDCT error: 0.000159346 GPU dst2d time: 0.00663 GPU idst2d time: 0.00589 DST/IDST error: 0.000225348 dim is 1024x1024 CPU dct2d time: 0.05079 CPU idct2d time: 0.047894 DCT/IDCT error:0.000128235 CPU dst2d time: 0.057469 CPU idst2d time: 0.049495 DST/IDST error:0.000342496 GPU Device 0 with compute capability 8.9 transpose error: 0 GPU dct2d time: 0.001625 GPU idct2d time: 0.001378 DCT/IDCT error: 9.06671e-05 GPU dst2d time: 0.001654 GPU idst2d time: 0.001373 DST/IDST error: 0.000128222 dim is 512x512 CPU dct2d time: 0.011406 CPU idct2d time: 0.010484 DCT/IDCT error:6.01641e-05 CPU dst2d time: 0.012733 CPU idst2d time: 0.011252 DST/IDST error:0.00015519 GPU Device 0 with compute capability 8.9 transpose error: 0 GPU dct2d time: 0.000582 GPU idct2d time: 0.00047 DCT/IDCT error: 4.25335e-05 GPU dst2d time: 0.000615 GPU idst2d time: 0.00048 DST/IDST error: 6.01513e-05 ``` GPU版本仍有优化空间,可通过如下命令获取一些nsight的建议继续优化 ``` bash sudo /usr/local/cuda/bin/ncu ./dct_operator > ../gpu_kernel_ns_result.txt ```