Parallel Computing: Challenges and Memory Models of MPI and OpenMP

If a computer is deficient for numerical simulations, why not connect several computers to work together? With ample funding, more computers should linearly boost the overall efficiency of the simulation, right? It sounds great, but if only things were that simple.

Decades ago, computers had merely a single core. A distributed memory model system could be implemented by connecting a series of computers to work on a task. In this setup, each core has its own independent memory and executes its tasks autonomously. As a result, core A can not access the memory of core B[1].

However, cores executing the same program must inevitably communicate with one another. Just as colleagues need to update each other on their progress, cores must exchange information to work together effectively.

In a distributed memory model, communication between cores is managed through the Message Passing Interface (MPI)[2,3]. Much like a mail service, core A sends messages to core B, which receives them, allowing for seamless communication between the cores.

If every advancement required constant battles with hardware, it would be frustrating. Therefore, in the summer of 1994, the Message Passing Interface (MPI) 1.0 was released. Thanks to the collaborative efforts of leading hardware vendors, academia, and industry, this library, supporting Fortran and C, is not only easy to use but also highly portable and scalable. Today, MPI is still widely used, but its past success also brings challenges in the present.

After the introduction of the first computers, each was equipped with only a single core for a considerable period. Initially, enhancing computing performance was a straightforward process: increasing the clock rate of the core. In other words, it meant boosting the core's operating frequency. For instance, the ENIAC had a clock rate of just 100 kHz, whereas today, GHz clock rates are commonplace for general CPUs[4].

As single-core performance increased, it approached physical limits, leaving little room for significant breakthroughs. As a result, engineers turned their focus to multi-core architectures to push the boundaries of computing performance. Today, even smartphones feature multi-core processors, not to mention the processors used in computers.

Packing multi-cores into a single processor is like relocating residents from individual estates into a single apartment building. With neighbors living so closely together, if the limited garden space (memory) is divided into small, equal portions, each one ends up with only a space as tiny as a potted plant.

In this context, compared to the distributed memory model, the shared memory model or shared-address-space allows each core to keep its own local memory while also sharing the same Random Access Memory (RAM) with other cores.

As a result, for Fortran OpenMP 1.0 was released in 1997. It assists programmers in efficiently distributing tasks across different cores within a single processor without requiring significant changes to the code[5,6,7]. While this sounds great, the magic of OpenMP is limited to a single processor. For systems with more than two processors, MPI is still needed for additional support.

Fig.1 : Distributed memory model.
Each core has its own independent memory, and the cores communicate with each other through a message-passing interface. Each core can send messages to other cores and receive messages from them.
Fig.2 : Shared memory model
Each core independently has its own local memory, and of course shares the public memory with other cores.

MPI was introduced earlier than OpenMP. Can a MPI program be used on multi-core processors without any extra effort in core modification? The answer is yes. A MPI program can run effectively on multi-core processors because each MPI process has its own independent memory space, which is allocated from the overall memory, as shown in Fig3. Therefore, the cost is the linearly increment of total memory demand with the number of cores.

When a processor has 4 cores,  the money for memory can effortlessly boost computing performance without much hassle. However, as the number of cores in a processor increases, not only does the cost become exorbitant, but the amount of independent memory available to each core becomes a bottle neck. This is because the overall memory capacity of any system has a limit. 

As the number of cores grows, the total memory demand increases linearly, while the overall memory capacity remains finite. As illustrated in Figure 4, the linear rise in memory usage with the increasing number of cores highlights the urgent need to explore solutions that utilize a shared memory model.

Fig.3 : An MPI program can run efficiently on multi-core processors. Simply divide the shared memory space among the cores, so each MPI process operates within its own independent memory space.
Fig.4 : Comparison of memory usage when parallelizing EIRENE-NGM programs using MPI and OpenMP. credit: D.V. Borodin, et. al., "Fluid, kinetic and hybrid approaches for neutral and trace ion edge transport modelling in fusion devices", Nucl. Fusion 62 (2022) 086051




  1. Grama, Ananth, et.al.. Introduction to Parallel Computing. Addison-Wesley, 2003.
  2. Walker, D W. 1992. "Standards for message-passing in a distributed memory environment". United States. https://www.osti.gov/servlets/purl/7104668.
  3. CORPORATE The MPI Forum. 1993. MPI: a message passing interface. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing (Supercomputing '93). Association for Computing Machinery, New York, NY, USA, 878–883. https://doi.org/10.1145/169627.169855
  4. Shurkin, Joel.,.Engines of the mind: the evolution of the computer from mainframes to microprocessors. New York: Norton, 1996. ISBN 978-0-393-31471-7.
  5. R. Chandra, R. Menon, L. Dagum, D. Kohr, D. Maydan, J. McDonald, Parallel Programming in OpenMP. Morgan Kaufmann, 2000. ISBN 1-55860-671-8
  6. Quinn Michael J, Parallel Programming in C with MPI and OpenMP McGraw-Hill Inc. 2004. ISBN 0-07-058201-7
  7. OpenMP.org. https://www.openmp.org/

Popular posts from this blog

【讀書筆記】改變世界的十七個方程式

【讀書筆記】我想開始去跑步