ID |
原文 |
译文 |
40576 |
目标检测任务对于检测任务精度和实时性都有很高要求,YOLOv3-tiny网络在这两点有很好的表现. |
The YOLOv3-tiny network performs well in both accuracy and real-time for object detection. |
40577 |
但是其复杂的网络结构,使得实际应用需要从软件和硬件方面都进行针对性的优化. |
However, its complex network structure makes practical applications require targeted optimization from both software and hardware aspects. |
40578 |
为了达到实时要求,综合使用三种优化技术: |
In order to meet the real-time requirements, three optimization techniques are used comprehensively. |
40579 |
在软件层面,通过融合批归一层降低计算量,低位宽增大资源利用率; |
At the software level, the amount of computation is reduced through the fusion of batch normalization layer, while the low bit width to increase resource utilization. |
40580 |
设计多维度并行FPGA计算核心匹配多个卷积层,提高整体吞吐率; |
The multi-dimensional parallel FPGA computation cores are designed to match multiple convolutional layers to improve the overall throughput. |
40581 |
细粒度层间流水和pingpong缓存设计,降低数据传输时间. |
Fine-grained inter-layer flow and pingpong buffer design to reduce the data transfer time. |
40582 |
在ZCU104型号的FPGA上,实现了418ⅹ418图片的21ms检测延时,超过同类加速器设计,并在DSP效率上有2.86倍或者8.81倍的提升. |
With the ZCU104 model FPGA, it achieves a detection latency of 21 ms for 418 x 418 images, which exceeds similar accelerator designs and improves the DSP efficiency by 2.86 times or 8.81 times. |
40583 |
当前基于忆阻器的神经网络加速器存在的资源需求高、系统功耗大等问题, |
The current ReRAM-based NN acceleratorshave many problems such as high hardwareresource demand and high power consumption. |
40584 |
提出了一种包含剪枝及量化算法在内的神经网络模型压缩框架. |
An energy-efficient modelcompression framework consisting of pruning and quantization algorithms is proposed. |
40585 |
根据忆阻器阵列紧密耦合的特点,设计了一种忆阻器阵列感知的规则化增量剪枝算法,在保证模型准确度的条件下实现了硬件资源的节省; |
According to the tightly coupled crossbar structure and unstructured sparsity, a crossbar-aware incrementalstructured pruning algorithm is designedtoachievehigher sparsity and accuracy. |