ScienceDirect.com

[...]

The Neptune implementation of the EM-EMC algorithm can perform parallel calculations using either CPU (multi-core) or GPU (many-core) processors, making maximum use of parallelism in each case to reduce execution times. All simulations were performed with a Supermicro workstation, with dual Intel Xeon Gold 6136 CPUs (2 x 12 = 24 physical cores; 3.0–3.7 GHz clock rate) and an Nvidia Titan RTX GPU (4608 CUDA cores; 1.35–1.77 GHz clock rate). Single precision floating point calculations were used to maximize performance.

[...]

Execution time using the GPU implementation was 2.7 min (4608 cores), compared to 26.2 min using the parallel CPU implementation (24 cores).

[...]


Have fun!

Cheers,
Scott.