Main content

Deterministic, Weak-scaling Parallelism for Wirelength- and Timing-driven FPGA Placement, Suitable for Multicore and Manycore Architectures

Show full item record

Title: Deterministic, Weak-scaling Parallelism for Wirelength- and Timing-driven FPGA Placement, Suitable for Multicore and Manycore Architectures
Author: Fobel, Christian
Department: School of Computer Science
Program: Computer Science
Advisor: Grewal, GaryStacey, Deborah
Abstract: Field-Programmable Gate Arrays (FPGAs) enable rapid-prototyping of digital logic designs in-house, without the significant up-front expense of building custom fabrication facilities. However, as the number of resources on each new generation of FPGAs continues to grow rapidly, enormous pressure is placed on the development of algorithms to reduce hardware compilation times to maintain the competitive advantage of using FPGAs. Approximately half of the compilation time for FPGA designs is spent performing placement (which is NP-hard). Serial simulated annealing is typically used for placement in practice, where runtime unfortunately grows exponentially with circuit size to be placed. Therefore, development of parallel placement algorithms that can harness the increasingly abundant throughput of modern manycore architectures to improve runtimes remains a critical concern. Parallel FPGA placement methods in literature provide no theoretical basis for scalability, and reported runtime results suggest non-parallelizable work scaling with the size of the problem, preventing scaling. We propose a parallel FPGA placement methodology based on "completely parallelizable" patterns, leading to a weak-scaling isoefficiency function in Big-Theta(p log p), while maintaining determinism. For placement, this means deterministic results where linear speedups are expected as the number of parallel worker threads (p) increases as long as the problem size (i.e., netlist size) grows accordingly. Using parallel patterns, we also propose the first scalable algorithms for timing analysis, which are ideally suited for modern manycore architectures, such as GPUs. Our experimental results show that our proposed wirelength- and timing-driven placement tools achieve mean absolute runtime improvements of 19x and 31x, respectively, on a commodity GPU over a state-of-the-art academic placer (VPR). With respect to quality, our wirelength-driven tool improves solution quality by 5% over VPR, while our timing-driven placer improves critical-path delay by 20% compared to our proposed wirelength driven method. Our results also indicate increased parallel efficiency as the size of the problem grows. Since both the parallel worker count on modern commodity parallel architectures and the number of resources available on modern FPGAs are growing rapidly, weak-scaling is an ideal fit for parallel FPGA placement to provide sustainable performance for future FPGA designs and manycore architectures.
Date: 2015-04
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.

Files in this item

Files Size Format View Description
Fobel_Christian_201504_phd.pdf 3.701Mb PDF View/Open Main thesis document

This item appears in the following Collection(s)

Show full item record