Considering the Impact of Directional Variance in Bandwidth and Latency on Optimal Data Partitions in Heterogeneous Systems
Computing power is expensive, in terms of equipment, time, and energy, and these costs limit the feasible expansion of computers. To maximize the size and complexity of problems we can solve, applications are optimized to use available computers more efficiently. Parallel matrix-matrix multiplication (MMM), a linear algebra kernel function, is one such task for which the computation algorithm has been highly optimized on many systems. In addition to being used widely in scientific computing, MMM is also used to solve other standard matrix-matrix operations. The partitioning strategy for MMM has been optimized for homogeneous platforms, where all processors and communication links are identical. However, finding the optimal partitioning strategy on heterogeneous systems, where there is some variation in processor or communication speed, has been identified as NP-complete for an arbitrary number of processors, and thus is still an open problem. In the case of a two-processor heterogeneous system, two partitioning strategies had previously been identified as the only potentially optimal strategies. This thesis models communication time in more detail to determine the impact of those details on which of the two partition shapes is optimal for 2 processors. Specifically, I create models for execution time for four common MMM algorithms that allow for different bandwidths and latencies in different directions of communication. Based on these models, I develop an algorithm that takes in problem size, system parameters, and the desired matrix multiply algorithm and determines the optimal partitioning strategy. Though my model agrees with simpler models in most cases, communication factors do impact some decisions made near the boundary zone of the simpler models.