A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Filters
Teraflop FPGA Design
2011
2011 IEEE 20th Symposium on Computer Arithmetic
This article will review devices and methods for achieving consistent high performance system implementations in floating point. ...
FUSED DATAPATH MAPPING Fused datapath methodology uses rules to create functional clusters, where the normalization and denormalization is merged among multiple operators [1] . ...
INTRODUCTION Many operator libraries have been designed for FPGAs; a brief survey of these shows that the most commonly used operators (multiply and add/subtract) have similar areas, performance levels ...
doi:10.1109/arith.2011.32
dblp:conf/arith/Langhammer10
fatcat:tg6iwj6a65chvgrshoaydzgpnq
Tools and Techniques for Efficient High-Level System Design on FPGAs
[article]
2014
arXiv
pre-print
The combination of 7 floating-point precisions, fused-datapath support, custom operator support and automated folding allows exploring the best tradeoffs between accuracy, size and throughput. ...
In order for FPGAs to be successful outside traditional markets, tools which enable software programmers to achieve high levels of system performance while abstracting away the FPGA-specific details are ...
IMPLEMENTATIONS USING IEEE-754 OPERATOR ASSEMBLY AND THE PRESENTED FUSED DATAPATH TECHNIQUE ON STRATIXV, TARGETING SINGLE PRECISION AND A CUSTOM 35 BIT FRACTION FORMAT Type Precision Performance ...
arXiv:1408.4797v1
fatcat:uonfo5musfb2hh7t2o7dtic5oa
FPGA Implementation of Double Precision Floating Point Multiplier
2022
International Journal on Recent and Innovation Trends in Computing and Communication
High speed computation is the need of today's generation of Processors. ...
Especially in the field of signal processing, multiplication division operation is widely used in many applications. ...
It needs a large variety of matrix processes, and also the ability to perform a series of matrix operations with the same structure. ...
doi:10.17762/ijritcc.v10i12.5896
fatcat:o6ijkmcvmzh4jhspy5fwvwhcwa
A mixed-precision fused multiply and add
2011
2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR)
The floating-point fused multiply and add, computing R=AB+C with a single rounding, is now an IEEE-754 standard operator. ...
Like the standard FMA operator, the proposed mixedprecision operator computes AB+C with a single rounding, and fully support subnormals. ...
, complex operations, range reductions, multiple-precision operations, and others [8] . ...
doi:10.1109/acssc.2011.6189977
dblp:conf/acssc/BrunieDD11
fatcat:2yjz6ikeuvbfthr6etvvqz7leq
Latency Sensitive FMA Design
2011
2011 IEEE 20th Symposium on Computer Arithmetic
The implementation of merged floating-point multiply-add operations can be optimized in many ways. ...
The cascade design has the same area and energy budget as a traditional fused multiple-add FMA. ...
INTRODUCTION A high performance floating-point unit is a major component of modern CPU and GPU designs. ...
doi:10.1109/arith.2011.26
dblp:conf/arith/GalalH10
fatcat:zdzrod6iuzga7btd5kfsavhabu
From SODA to scotch: The evolution of a wireless baseband processor
2008
2008 41st IEEE/ACM International Symposium on Microarchitecture
Ardbeg's redesign process can be grouped into the following three major areas: optimizing the wide SIMD datapath, providing long instruction word (LIW) support for SIMD operations, and adding application-specific ...
Ardbeg also provides modest LIW support by allowing two SIMD operations to issue in the same cycle. ...
Acknowledgment We thank the anonymous referees for their useful comments and suggestions. ...
doi:10.1109/micro.2008.4771787
dblp:conf/micro/WohLSMMCBKRWF08
fatcat:2vlng2aqazfbnhz4yjvxwqv3za
High-Level Design Tools for Floating Point FPGAs
2015
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '15
In the matrix-matrix multiplication algorithm, shown in Figure 2,
the multiplication is performed using blocks of data, where on ...
Matrix-Matrix Multiplication pseudo code. ...
doi:10.1145/2684746.2689079
dblp:conf/fpga/SinghPC15
fatcat:xk2qbx244fczffdbycnfaaedv4
A Full-stack Accelerator Search Technique for Vision Applications
[article]
2021
arXiv
pre-print
, software scheduling, and compiler passes such as operation fusion and tensor padding. ...
When evaluated on EfficientNet, ResNet50v2, and OCR inference performance relative to a TPU-v3, designs generated by FAST optimized for single workloads can improve Perf/TDP (peak power) by over 6x in ...
PE systolic arrays perform a matrix-vector multiply each cycle. Vector and scalar PEs can be modeled by setting systolic array X and/or Y dims to 1. ...
arXiv:2105.12842v1
fatcat:mtunvjdcdrcr5pc5bpyfye7mea
A Fused Hybrid Floating-Point and Fixed-Point Dot-Product for FPGAs
[chapter]
2010
Lecture Notes in Computer Science
Results using a high-end Xilinx FPGA and an order 150 dot-product demonstrate that, for equivalent accuracy metrics, it is possible to utilize 3.8 times fewer resources, operate at 1.62 times faster clock ...
In this paper we present a dotproduct implementation which operates using a hybrid floating-point and fixed-point number system. ...
This operation is also a building block in other fundamental algebraic operations such as matrix-byvector, and matrix-by-matrix multiplications. ...
doi:10.1007/978-3-642-12133-3_16
fatcat:f4jbk43ygfaspa7rfdfke5kmrq
Customizing wide-SIMD architectures for H.264
2009
2009 International Symposium on Systems, Architectures, Modeling, and Simulation
operation support to increase the processing performance, and a fast programmable crossbar to support complex data permutation patterns. ...
Several customized features have been added to improve the processing performance and lower the power consumption. ...
Fused Operation Based on this analysis, we propose to fuse the frequently used instruction pairs. ...
doi:10.1109/icsamos.2009.5289229
dblp:conf/samos/SeoWMMVC09
fatcat:b7ur6xpzwfbuxe4ovjbimfr53i
Custom FPGA-based soft-processors for sparse graph acceleration
2015
2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
We develop the processor RTL using Vivado High-Level Synthesis and also provide an assembler and compilation flow to configure the processor instruction and data memories. ...
FPGA-based soft processors customized for operations on sparse graphs can deliver significant performance improvements over conventional organizations (ARMv7 CPUs) for bulk synchronous sparse graph algorithms ...
We use SpMV (streaming multiply-accumulate datapath) to quantify performance on our architecture. ...
doi:10.1109/asap.2015.7245698
dblp:conf/asap/Kapre15
fatcat:mqos2rxf4zdkxcsq2hf6q3xji4
Exploiting On-chip Memory Bandwidth in the VIRAM Compiler
[chapter]
2001
Lecture Notes in Computer Science
It combines vector processing with mixed logic and DRAM to achieve high performance with relatively low energy, area, and design complexity. ...
Many architectural ideas that appear to be useful from a hardware standpoint fail to achieve wide acceptance due to lack of compiler support. ...
We are also very grateful for the support provided by the Cray, Inc. compiler group in helping us use and modify their compiler. ...
doi:10.1007/3-540-44570-6_8
fatcat:yo6zqdknxfcabgvbf6ghsh5qye
A Customized Processor for Energy Efficient Scientific Computing
2012
IEEE transactions on computers
It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. ...
To combat these challenges, this paper presents the PEPSC architecture-an architecture customized for the domain of data parallel dense matrix style scientific application where power efficiency is the ...
This research was supported by the US National Science Foundation grant CNS-0964478 and ARM Ltd. ...
doi:10.1109/tc.2012.144
fatcat:6wb7y7femfftlh5geqsqbm37wy
Modified Booth Recoder for Efficient Add-Multiply Operator
2015
IJARCCE
The fusion of the two operators resulting in Fused Add-Multiply(FAM) operator. ...
It consists of recoding table which has been used to minimize the partial products of multiplier. An adder and the multiplier operator of the unit is combine to form a single add-multiply unit. ...
Multipliers were introduced to perform the multiplication operation of the arithmetic circuits using add and shift operation. ...
doi:10.17148/ijarcce.2015.45115
fatcat:4wg7pbe2o5cz7ahuv6dyft24sm
AnySP
2009
SIGARCH Computer Architecture News
These three operating modes provide high throughput across varying application types. ...
The current generation of devices employs a combination of general-purpose processors, digital signal processors, and hardwired accelerators to provide giga-operations-per-second performance on milliWatt ...
We also thank the anonymous referees for their useful comments and suggestions. ...
doi:10.1145/1555815.1555773
fatcat:m3psv47xdbgvjcvfu2gzlqe5eu
« Previous
Showing results 1 — 15 out of 304 results