ID |
原文 |
译文 |
40586 |
针对忆阻器加速器系统中ADC单元和忆阻器阵列功耗占比过大等问题,设计了一种二的幂次量化算法以降低加速器系统中ADC的精度需求以及计算阵列中低阻值忆阻器器件个数,实现系统功耗的降低. |
A power of two quantizationmethod is designedto reduce ADC resolution requirements and the numberof low resistance states(LRS) ReRAM cells in crossbars to improvethe energy efficiency. |
40587 |
实验结果表明:提出的神经网络模型压缩框架在忆阻器加速器部署网络时可取得17.2~30.7倍的能效提升以及4.3~9.3倍的加速比,模型的精度损失维持在1%左右. |
Experimental results show thatthe proposed modelcompression framework can achieve 17.2-30.7 x energy efficiencyand 4.3-9.3 x speedup, compared with ReRAM-based acceleratorsfor dense NN with about 1% accuracy loss. |
40588 |
可重构阵列依靠数据流驱动带来的能效优势被作为加速器广泛运用在特定领域之中. |
Reconfigurable arrays are widely used as accelerators in specific fields relying on the energy efficiency advantages brought by the driving of data flow. |
40589 |
随着应用范围的增大,当应用中存在不同执行速率的区域时,采用传统的空间映射方案将整个数据流图进行直接映射会造成严重的性能损失. |
With the increase of application range, when there are regions with different execution rates in the application, it will cause serious performance loss by adopting the traditional spatial mapping scheme to map the whole data flow graph directly. |
40590 |
提出了一种基于数据流解耦的映射方法,通过在执行速率不同的区域之间加入解耦单元以解决执行速率失配的问题. |
A mapping method based on data flow decouplingisproposed to solve the mismatch of execution rate by adding decoupling units between regions with different execution rates. |
40591 |
同时利用一种分布式多阶段的布局算法,将解耦后的数据流图映射在分簇式的互连结构中. |
At the same time, a distributed multi-stage placement algorithm is used to map the decoupled dataflow graph into a clustered interconnection structure. |
40592 |
与传统映射方案相比,对数据流进行解耦后的映射方案在保持较高的互连布通率的基础上可以平均提升57.68%的执行性能,同时降低32%~43%的互连开销. |
Compared with the traditional mapping scheme, the mapping scheme after decoupling the data flow can improve the execution performance by 57.68% on average while maintaining a high interconnection routabilityand reduce interconnection costs by 32% to 43%. |
40593 |
传统的卷积神经网络需要大量的运算单元和繁琐的数据存取,导致计算速度较慢,效率不高. |
Traditional convolutional neural network requires a large number of computing units and too much data access, resulting in slow calculation speed and low efficiency. |
40594 |
本文设计了全新的数据块结构以充分利用数据复用,大大减少数据读取次数,并且全面调用FPGA的并行运算资源 |
A new data block structure is designed to make full use of data multiplexing, greatly reducing the number of data reading and fully calling the parallel computing resources of the FPGA. |
40595 |
,同时进行多个乘加操作,实现了高效并行卷积计算电路. |
In this way, multiple multiplication and addition operations are carried out simultaneously, to realize an efficient parallel convolution calculation circuit. |