Academia.eduAcademia.edu

Shared memory multiprocessor system

description586 papers
group0 followers
lightbulbAbout this topic
A shared memory multiprocessor system is a computer architecture where multiple processors access a common memory space, allowing them to communicate and coordinate their operations efficiently. This system enables concurrent execution of processes, facilitating parallel computing and improving performance for applications that require high computational power.
lightbulbAbout this topic
A shared memory multiprocessor system is a computer architecture where multiple processors access a common memory space, allowing them to communicate and coordinate their operations efficiently. This system enables concurrent execution of processes, facilitating parallel computing and improving performance for applications that require high computational power.

Key research themes

1. How can persistent and low-latency shared memory be efficiently realized in distributed multiprocessor datacenter systems?

This theme investigates the integration of next-generation non-volatile memories (NVMs) into distributed shared memory (DSM) systems to provide a global persistent memory abstraction with low latency, reliability, and high availability in datacenter-scale multiprocessor environments. It matters because NVMs offer DRAM-like speeds combined with persistence and high density, which can significantly enhance large-scale application performance, persistence, and fault tolerance, but leveraging these benefits across distributed nodes requires novel system software and hardware designs.

Key finding: Introduced Distributed Shared Persistent Memory (DSPM) framework and implemented Hotpot, a kernel-level system providing a global persistent shared memory space accessible via native load/store instructions in distributed... Read more
Key finding: Demonstrated system software techniques leveraging RDMA over InfiniBand for efficient remote memory access in multiprocessor computing systems, enabling memory data transfers without OS or application intervention on target... Read more
Key finding: Developed the Choices OS architecture employing object-oriented design to implement a modular and extensible virtual memory and backing store system for shared memory and networked multiprocessors, enabling uniform and... Read more
Key finding: Reviewed cache coherence protocols (SI, MI, MSI, MESI, MOSI, MOESI) in distributed multiprocessor environments highlighting their impact on maintaining data consistency, coherence overheads, and system performance, informing... Read more
Key finding: Provided a comprehensive overview of shared memory multiprocessor architectures including Uniform Memory Access (UMA) design, cache coherence models, and system software layers, emphasizing architectural and software... Read more

2. What software and runtime techniques effectively manage scheduling, load balancing, and parallelism in shared-memory multiprocessor systems?

This theme focuses on dynamic scheduling approaches, task parallelism exploitation, and runtime mechanisms that optimize workload distribution and parallel execution on shared-memory multiprocessors. Efficient scheduling is essential to fully leverage the hardware parallelism, improve load balance, and increase application throughput in multiprocessor systems.

Key finding: Presented Lazy Binary Splitting (LBS), an adaptive user-level scheduler that improves upon eager binary splitting by reducing the need for manual tuning of stop-splitting thresholds for nested parallel do-all loops, thereby... Read more
Key finding: Proposed implicit transactional memory implemented using a multi-checkpoint mechanism allowing speculative execution beyond synchronization points without explicit software identification, which reduces serialization and... Read more
Key finding: Developed a parallel interest matching algorithm for distributed virtual environments executed on shared-memory multiprocessors that distributes the workload of space-time event matching across multiple processors,... Read more
Key finding: Proposed a parallel synchronous simulation algorithm for VHDL that increases parallelism by analyzing signal dependencies and relaxing synchronization barriers, enabling efficient execution on shared-memory multiprocessors... Read more
Key finding: Created a diverse suite of seven parallel bioinformatics applications optimized with thread-level parallelism for shared memory multiprocessors, enabling evaluation of parallel programming techniques and workload... Read more

3. How can software-level memory management policies and algorithms mitigate memory contention and improve scalability in shared memory multiprocessors?

This theme studies operating system and hardware memory management strategies, including page allocation, memory bank partitioning, and transactional memory buffering, to reduce interference, contention, and coherence overhead in shared memory multiprocessors. These approaches aim to enhance throughput and energy efficiency by optimizing access to shared DRAM banks and maintaining cache coherence.

Key finding: Introduced Bank-level Partition Mechanism (BPM), a software-based page-coloring scheme implemented in the OS kernel that partitions DRAM banks across cores to eliminate bank-level memory interference, improving average system... Read more
Key finding: Analyzed how coherent buffering in private caches causes inefficiencies in lazy HTM by prematurely exposing speculative writes to coherence mechanisms; showed that employing non-coherent write buffers can mitigate overhead,... Read more
Key finding: Developed a parallel iterative solver for large sparse linear systems based on multilevel incomplete LU preconditioners leveraging nested dissection and task parallelism, employing dynamic scheduling for load balancing,... Read more
Key finding: Demonstrated a low-complexity implicit transactional memory design that leverages multi-checkpoint execution to support large speculative memory accesses with sequential consistency, reducing synchronization overhead and... Read more
Key finding: Proposed a two-level programming model combining a high-level coordination language (Concurrent Collections) and a lower-level parallel language (Habanero Java) to support flexible task distribution and mutual exclusion on... Read more

All papers in Shared memory multiprocessor system

We use the Abstract State Machine methodology to give formal operational semantics for the Location Consistency memory model and cache protocol. With these formal models, we prove that the cache protocol satis es the memory model, but in... more
The approach of program-driven simulation of multiprocessors has generally been believed to be too slow in order to perform experiments and performance evaluations with realistic workloads. We show that the program-driven approach for... more
It is well known that Time Warp may suffer from poor performance due to excessive rollbacks caused by overly optimistic execution. Here we present a simple flow control mechanism using only local information and GVT that limits the number... more
Mechanisms for managing message buffers in Time Warp parallel simulations executing on cache-coherent shared-memory multiprocessors are studied. Two simple buffer management strategies called the sender pool and receiver pool mechanisms... more
Logic programming has been used in a broad range of fields, from artifficial intelligence applications to general purpose applications, with great success. Through its declarative semantics, by making use of logical conjunctions and... more
We propose a new parallel Branch and Bound algorithm for the Quadratic Assignment Problem, which is a Combinatorial Optimization problem known to be very hard to solve exactly. An original method to distribute work to processors using the... more
Token protocol provides a new coherence framework for shared-memory multiprocessor systems. It avoids indirections of directory protocols for common cache-to-cache transfer misses, and achieves higher interconnect bandwidth and lower... more
Direct volume rendering algorithms are too computationally expensive to offer interactive frame rates when rendering large 3D medical datasets on standard workstations. This article presents an image space parallelization of an image... more
This paper presents an extension of the Latency Time (LT) scheduling algorithm for assigning tasks with arbitrary execution times on a multiprocessor with shared memory. The Extended Latency Time (ELT) algorithm adds to the priority... more
In this paper we present a modification of the Dual Priority Scheduling Algorithm to work on shared memory multiprocessor systems improving the average-case schedulability. The proposal deals with global fixedpriority preemptive... more
The choice of a communication paradigm, or protocol, is central to the design of a largescale multiprocessor system. Unlike traditional multiprocessors, the FLASH machine uses a programmable node controller, called MAGIC, to implement all... more
When supported in silicon, transactional memory (TM) promises to become a fast, simple and scalable parallel programming paradigm for future shared memory multiprocessor systems. Among the multitude of hardware TM design points and... more
The Concept of Distributed System made life easier to communicate and share resources from any other system with the help of network. Due to the emergence of Distributed system, Data Security has become an increasing concern, and... more
Multiple programming models are emerging to address an increased need for dynamic task parallelism in multicore sharedmemory multiprocessors. This poster describes the main components of Rice University's Habanero Multicore Software... more
This paper proposes a novel methodology to efficiently simulate shared-memory multiprocessors composed of hundreds of cores. The basic idea is to use thread-level parallelism in the software system and translate it into corelevel... more
Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers... more
OpenMP relies heavily on barrier synchronization to coordinate the work of threads that are performing the computations in a parallel region. A good implementation of barriers is thus an important part of any implementation of this API.... more
The approach of program-driven simulation of multiprocessors has generally been believed to be too slow in order to perform experiments and performance evaluations with realistic workloads. We show that the program-driven approach for... more
This paper describes MTOOL, a sofware tool for analyzing performance losses in shared memory parallel programs. MTOOL augments a program with low overhead instrumentatwn which perturbs the program's execution as little as possible while... more
In the edge detection, the classical operators based on the derivation are sensitive to noise which causes detection errors. It is even more erroneous in the case of omnidirectional images, due to geometric distortions caused by the used... more
As processor performance continues to increase, greater demands are placed on the bus and memory systems of small-scale sharedmemory multiprocessors. In this paper, we investigate how to reduce these demands by organizing groups of... more
Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware and the network interface/router. In this work we exploit such... more
Techniques that can cope with the large latency of memory accesses are essential for achieving high processor utilization in large-scale better performance than each one on its own. Overall, we show that using suitable combinations of the... more
The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attention in the past several years, while... more
Good cache memory performance is essential to achieving high CPU utilization in shared-memory multiprocessors. While the performance of caches is determined by both application end operating system (OS) references, most research has... more
The memory consistency model supported by a multiprocessor architecture determines the amount of buffering and pipelining that may be used to hide or reduce the latency of memory accesses. Several different consistency models have been... more
In this paper, we propose efficient parallel implementations of the auction/sequential shortest path and the e-relaxation algorithms for solving the linear minimum cost flow problem. In the parallel auction algorithm, several augmenting... more
Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code within a palatable amount of time. With the emergence of... more
The goal of creating high-quality process systems for real-world applications leads to the need for an engineering approach to process system development. The development of process engineering as a distinct discipline can be greatly... more
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the access to main memory to recover the sharing status of the... more
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower... more
Interconnect networks employing wormhole-switching play a critical role in shared memory multiprocessor systems-on-chip (MPSoC) designs, Multicomputer systems and System Area Networks. Virtual channels greatly improve the performance of... more
In shared-memory multiprocessor systems it may be more efficient to schedule a task on one processor than on mother, Due to the inevitability of idle processors in these environments, there exists en important tradeoff between keeping the... more
This paper describes the implementation of RTI-Kit, a modular software package to realize runtime infrastructure (RTI) software for distributed simulations such as those for the High Level Architecture. RTI-Kit software spans a wide... more
In order to reduce remote memory accesses on CC-NUMA multiprocessors, we present an interprocedural analysis to support static loop scheduling and data allocation. Given a parallelized program, the compiler constructs graphs which... more
The scalability port (SP) is a point-to-point cache consistent interface to build scalable shared memory multiprocessors. The SP interface consists of three layers of abstraction: the physical layer, the link layer and the protocol layer.... more
Reconfigurable cache memory is important to improve the cache performance and reduces the energy consumption. In this paper, a review for previous papers related with reconfigurable cache memory were presented and compared it with our... more
Reconfigurable cache memory is important to improve the cache performance and reduces the energy consumption. In this paper, a review for previous papers related with reconfigurable cache memory were presented and compared it with our... more
ÐScalable distributed shared-memory architectures rely on coherence controllers on each processing node to synthesize cache-coherent shared memory across the entire machine. The coherence controllers execute coherence protocol handlers... more
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. A significant part of the occupancy is due to the latency of accessing... more
In the solution of large-scale numerical problems, parallel computing is becoming simultaneously more important and more dificult. The complex organization of today’s multiprocessors with several memory hierarchies has forced the... more
Download research papers for free!