The supplementary document of

G-CRS:GPU Accelerated Cauchy Reed-Solomon Coding



Environment for G-CRS

This website provided the related code and experimental results for our measurement of G-CRS. The environment for our measurement is:

CPU: Intel(R) Core(TM) i7-6800K CPU 3.40GH
Motherboard: ASUS X99-A II
Main Memory: 64 GB 2133 MHz DDR4
Disk: 3 TB ST3000VX010 SATA3 5900RPM
OS: Ubuntu 16.04
CUDA version: CUDA 7.5
GPU: Maxwell GTX 980 and Pascal Titan X

Thanks for your visiting. Please contact Liu, Chengjian through cscjliu@comp.hkbu.edu.hk if you have any further question about our G-CRS.

Section 1: Throughput Under Different Workloads

Section 1.1 Introduction

We measured the Throughput of G-CRS with each block contains 128 threads, and the number of blocks ranged from 24 to 213. This section provided the source code and experimental results for our measurement on Maxwell GTX 980 and Pascal Titan X

Section 1.2 Source Code

Source Code: MeasureBufferSize.zip

Please take a look at the README.md to see how to run the program

Section 1.3 Experimental Results
Results on Maxwell GTX 980
m=1 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 11.95 11.60 11.74 11.70 11.77
25 23.56 22.99 23.31 23.38 23.64
26 47.18 45.00 46.53 41.90 42.32
27 57.38 50.34 52.48 50.03 55.94
28 71.24 77.57 83.39 77.92 69.04
29 93.14 99.74 101.86 98.15 89.26
210 114.82 115.98 118.86 116.74 113.27
211 128.72 131.94 128.92 129.22 128.50
212 139.38 139.12 138.04 138.09 138.61
213 143.78 144.19 144.28 144.67 144.71
m=2 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 12.07 11.64 11.77 11.73 12.07
25 24.42 23.09 20.77 20.75 21.29
26 29.93 26.40 27.43 27.47 27.19
27 48.47 57.10 57.52 54.49 43.60
28 68.15 73.79 74.59 71.70 62.92
29 92.38 91.57 94.52 92.61 86.62
210 105.82 109.96 105.58 105.78 105.17
211 117.68 119.47 119.02 118.03 118.62
212 126.80 127.90 126.96 126.98 126.20
213 131.40 130.35 130.53 130.47 131.38
m=3 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 11.94 11.65 10.63 10.54 10.77
25 15.79 14.20 14.26 13.29 13.52
26 26.84 23.01 23.40 22.02 23.55
27 47.66 43.70 42.08 40.05 42.44
28 65.55 67.68 65.97 63.31 53.83
29 88.00 86.82 84.40 79.72 73.30
210 98.95 97.65 102.07 96.13 88.14
211 109.60 109.05 110.17 105.3 97.94
212 116.71 116.20 116.98 112.00 104.65
213 120.60 118.91 120.46 114.89 107.87
m=4 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 11.71 9.62 9.56 9.32 9.63
25 14.44 11.73 12.09 11.48 12.54
26 24.17 26.60 26.17 23.29 20.10
27 41.54 43.82 42.77 39.98 33.52
28 63.53 58.97 58.05 52.87 47.26
29 80.42 80.23 71.36 63.30 60.02
210 92.98 92.57 85.99 74.69 69.45
211 102.41 102.95 95.02 82.01 76.42
212 108.53 108.16 99.28 85.25 80.00
213 111.48 109.81 102.65 87.79 82.14
Results on Pascal Titan X
m=1 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 14.16 14.95 15.15 15.16 15.04
25 36.66 33.89 34.83 33.92 35.43
26 76.69 69.39 74.92 72.04 74.01
27 82.28 80.96 81.66 76.19 78.41
28 145.82 125.30 127.37 116.20 124.03
29 159.27 151.46 154.13 153.34 158.76
210 225.33 227.30 219.83 207.59 230.65
211 249.45 239.98 236.42 245.38 259.73
212 267.65 259.40 254.72 253.72 268.44
213 277.29 269.12 267.61 267.68 277.87
m=2 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 18.56 21.22 21.77 19.37 19.87
25 44.46 42.97 34.07 33.12 34.8
26 41.43 40.04 40.59 37.89 38.70
27 75.62 71.36 68.79 67.12 69.54
28 117.73 114.90 110.02 100.84 99.76
29 164.95 171.73 162.69 160.06 153.72
210 207.43 195.60 203.04 189.38 195.53
211 235.23 235.10 217.73 209.91 224.77
212 252.54 240.16 243.93 239.96 250.72
213 256.18 251.11 244.91 244.62 258.17
m=3 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 17.56 19.88 19.00 18.10 18.92
25 20.84 20.15 20.09 17.77 19.54
26 55.18 49.17 44.59 45.19 44.52
27 81.21 74.56 72.32 69.59 71.75
28 122.28 107.83 105.39 102.08 102.87
29 198.34 174.65 152.89 127.04 126.47
210 202.57 189.84 171.17 159.15 153.76
211 222.40 205.54 203.70 192.70 217.93
212 236.22 226.82 220.66 215.90 228.09
213 240.97 231.18 229.94 226.69 237.06
m=4 (Unit for Throughput: GB/s)
Blocks w=4 w=5 w=6 w=7 w=8
24 18.83 17.62 17.91 15.39 16.62
25 22.23 20.15 20.35 18.97 19.45
26 42.54 37.20 36.41 36.87 36.92
27 78.57 67.11 66.35 61.05 63.53
28 112.47 98.60 95.82 89.88 90.13
29 176.87 144.88 140.98 137.58 140.28
210 203.15 178.58 174.79 162.83 166.07
211 218.14 198.77 199.25 189.35 178.73
212 221.21 212.86 207.82 197.71 193.00
213 226.567 218.06 213.79 205.06 201.72

Section 2: Peak Raw Coding Performance

Section 2.1 Introduction

We measured the raw coding throughput of G-CRS. The related code and experimental results are provided in this section.

Section 2.2 Source Code

Source Code: PeakRawMeasure.zip

Please take a look at the README.md to see how to run the program

Section 2.3 Experimental Results
Results on Maxwell GTX 980
Unit for Throughput: GB/s, Unit for time:ms
m 2 4 8 16
Result m_2.txt m_4.txt m_8.txt m_16.txt
Results on Pascal Titan X
Unit for Throughput: GB/s, Unit for time:ms
m 2 4 8
Result m_2.txt m_4.txt m_8.txt m_16.txt

Section 3: Optimization Analysis

Section 3.1 Introduction

We evaluate the effectiveness of our optimization strategies by adding our optimization strategies one by one, the code is in the file named EffectivenessOfOpt.zip. We investigate the dominating factor that contributes the most to the performance improvement by removing one optimization strategy each time, the code is in the file named DominatedFactor.zip.

Section 3.2 Source Code

Source Code: EffectivenessOfOpt.zip

Source Code: DominatedFactor.zip

Please take a look at the README.md to see how to run the program

Section 3.3 Experimental Results for Effectiveness of our Optimization Strategies
Results of Encoding on Maxwell GTX 980, note that the unit for throughput is GB/s for each table
k=10, m=1
w 4 5 6 7 8
Base 21.00 16.64 16.88 16.61 16.79
Bitmatrix 108.76 106.49 108.55 103.54 111.07
EfficientAccess 144.28 142.29 142.73 138.00 137.65
Bandwidth 145.31 143.07 143.60 141.28 141.26
G-CRS 145.19 144.70 145.38 145.69 145.22
k=10, m=2
w 4 5 6 7 8
Base 16.38 15.40 15.32 14.60 14.66
Bitmatrix 52.45 51.26 52.75 50.08 54.10
EfficientAccess 99.48 86.37 83.40 74.42 74.01
Bandwidth 132.22 130.96 129.12 116.08 110.60
G-CRS 132.15 131.07 131.74 131.83 132.10
k=10, m=3
w 4 5 6 7 8
Base 14.78 14.11 14.01 13.24 12.95
Bitmatrix 34.50 33.18 33.97 31.80 34.81
EfficientAccess 65.25 56.92 55.04 49.15 48.77
Bandwidth 121.29 107.68 99.31 84.65 77.53
G-CRS 121.18 119.94 120.69 115.97 108.48
k=10, m=4
w 4 5 6 7 8
Base 13.30 12.55 11.99 11.11 8.01
Bitmatrix 25.81 24.26 24.87 23.04 25.96
EfficientAccess 49.00 42.78 41.25 36.79 36.27
Bandwidth 92.05 75.59 65.75 57.35 51.56
G-CRS 111.86 110.89 104.07 88.64 82.62
Results on Pascal Titan X, note that the unit for throughput is GB/s for each table
k=10, m=1
w 4 5 6 7 8
Base 27.30 23.24 26.13 25.75 26.19
Bitmatrix 247.81 226.05 219.92 198.12 250.64
EfficientAccess 275.92 268.05 262.76 259.54 277.43
Bandwidth 276.33 269.67 263.55 260.41 277.06
G-CRS 278.77 272.38 267.65 267.04 279.52
k=10, m=2
w 4 5 6 7 8
Base 25.87 24.60 24.81 24.26 24.39
Bitmatrix 142.74 129.84 129.95 119.73 134.80
EfficientAccess 214.25 189.66 184.63 173.71 174.02
Bandwidth 254.54 244.07 237.00 231.69 244.75
G-CRS 256.23 249.15 245.73 243.33 257.80
k=10, m=3
w 4 5 6 7 8
Base 21.00 16.64 16.88 16.61 16.79
Bitmatrix 93.20 83.97 85.40 77.84 86.44
EfficientAccess 147.76 132.20 128.17 116.88 115.22
Bandwidth 237.42 217.59 210.35 197.95 186.95
G-CRS 240.68 233.10 227.06 223.70 238.60
k=10, m=4
w 4 5 6 7 8
Base 17.87 17.06 19.35 19.72 13.17
Bitmatrix 69.89 62.69 63.26 57.95 65.93
EfficientAccess 112.99 100.39 96.80 88.05 85.91
Bandwidth 216.47 185.47 163.45 143.10 128.80
G-CRS 228.91 220.57 213.31 203.72 200.66
Section 3.4 Experimental Results for Dominating Factor
Results of Encoding on Maxwell GTX 980, note that the unit for throughput is GB/s for each table
k=10, m=1
w 4 5 6 7 8
-Bitmatrix 23.67 17.91 18.01 17.91 17.82
-EfficientAccess 103.92 100.83 104.09 98.36 110.58
-Bandwidth 144.71 144.29 145.17 143.94 144.74
-DivergenceOpt 145.31 143.07 143.60 141.28 141.26
G-CRS 145.19 144.70 145.38 145.69 145.22
k=10, m=2
w 4 5 6 7 8
-Bitmatrix 17.95 17.73 17.89 17.62 15.66
-EfficientAccess 104.23 100.26 100.49 95.90 109.58
-Bandwidth 127.15 111.13 118.24 86.37 111.93
-DivergenceOpt 132.22 130.96 129.12 116.08 110.60
G-CRS 132.15 131.07 131.74 131.83 132.10
k=10, m=3
w 4 5 6 7 8
-Bitmatrix 18.02 17.74 17.79 16.37 14.44
-EfficientAccess 107.23 97.64 91.84 79.49 80.61
-Bandwidth 94.29 72.48 79.17 55.57 74.04
-DivergenceOpt 121.29 107.68 99.31 84.65 77.53
G-CRS 121.18 119.94 120.69 115.97 108.48
k=10, m=4
w 4 5 6 7 8
-Bitmatrix 17.19 17.55 16.33 15.93 8.89
-EfficientAccess 102.72 98.35 52.33 54.46 69.59
-Bandwidth 73.09 53.24 59.21 40.42 55.02
-DivergenceOpt 92.05 75.59 65.75 57.35 51.56
G-CRS 111.86 110.89 104.07 88.64 82.62
Results on Pascal Titan X, note that the unit for throughput is GB/s for each table
k=10, m=1
w 4 5 6 7 8
-Bitmatrix 30.36 27.46 27.65 27.31 27.26
-EfficientAccess 239.29 227.47 217.89 194.79 249.79
-Bandwidth 280.78 263.64 265.05 256.51 278.27
-DivergenceOpt 276.33 269.67 263.55 260.41 277.06
G-CRS 278.77 272.38 267.65 267.04 279.52
k=10, m=2
w 4 5 6 7 8
-Bitmatrix 26.88 27.10 27.45 27.03 23.91
-EfficientAccess 253.88 210.52 203.05 183.86 241.61
-Bandwidth 244.14 204.11 216.51 186.12 224.33
-DivergenceOpt 254.54 244.07 237.00 231.69 244.75
G-CRS 256.23 249.15 245.73 243.33 257.80
k=10, m=3
w 4 5 6 7 8
-Bitmatrix 27.12 26.75 27.23 25.43 22.16
-EfficientAccess 229.26 176.00 172.87 146.08 170.87
-Bandwidth 199.13 155.94 169.03 132.75 169.40
-DivergenceOpt 237.42 217.59 210.35 197.95 186.95
G-CRS 240.68 233.10 227.06 223.70 238.60
k=10, m=4
w 4 5 6 7 8
-Bitmatrix 26.13 26.72 25.46 24.95 23.09
-EfficientAccess 222.01 188.70 110.48 114.58 168.92
-Bandwidth 166.10 121.27 134.42 98.53 130.52
-DivergenceOpt 216.47 185.47 163.45 143.10 128.80
G-CRS 228.91 220.57 213.31 203.72 200.66

Section 4: Overall Performance

Section 4.1 Introduction

We measured the coding throughput of pipelined G-CRS. The related code and experimental results are provided in this section.

Section 4.2 Source Code

Source Code: G-CRSPCIE.zip

Please take a look at the README.md to see how to run the program

Section 4.3 Experimental Results
Results on Maxwell GTX 980
Unit for Throughput: GB/s, Unit for time:ms
m 2 4 8 16
Result m_2.txt m_4.txt m_8.txt m_16.txt
Results on Pascal Titan X
Unit for Throughput: GB/s, Unit for time:ms
m 2 4 8
Result m_2.txt m_4.txt m_8.txt m_16.txt