ID 原文 译文
40576 目标检测任务对于检测任务精度和实时性都有很高要求,YOLOv3-tiny网络在这两点有很好的表现. The YOLOv3-tiny network performs well in both accuracy and real-time for object detection.
40577 但是其复杂的网络结构,使得实际应用需要从软件和硬件方面都进行针对性的优化. However, its complex network structure makes practical applications require targeted optimization from both software and hardware aspects.
40578 为了达到实时要求,综合使用三种优化技术: In order to meet the real-time requirements, three optimization techniques are used comprehensively.
40579 在软件层面,通过融合批归一层降低计算量,低位宽增大资源利用率; At the software level, the amount of computation is reduced through the fusion of batch normalization layer, while the low bit width to increase resource utilization.
40580 设计多维度并行FPGA计算核心匹配多个卷积层,提高整体吞吐率; The multi-dimensional parallel FPGA computation cores are designed to match multiple convolutional layers to improve the overall throughput.
40581 细粒度层间流水和pingpong缓存设计,降低数据传输时间. Fine-grained inter-layer flow and pingpong buffer design to reduce the data transfer time.
40582 在ZCU104型号的FPGA上,实现了418ⅹ418图片的21ms检测延时,超过同类加速器设计,并在DSP效率上有2.86倍或者8.81倍的提升. With the ZCU104 model FPGA, it achieves a detection latency of 21 ms for 418 x 418 images, which exceeds similar accelerator designs and improves the DSP efficiency by 2.86 times or 8.81 times.
40583 当前基于忆阻器的神经网络加速器存在的资源需求高、系统功耗大等问题, The current ReRAM-based NN acceleratorshave many problems such as high hardwareresource demand and high power consumption.
40584 提出了一种包含剪枝及量化算法在内的神经网络模型压缩框架. An energy-efficient modelcompression framework consisting of pruning and quantization algorithms is proposed.
40585 根据忆阻器阵列紧密耦合的特点,设计了一种忆阻器阵列感知的规则化增量剪枝算法,在保证模型准确度的条件下实现了硬件资源的节省; According to the tightly coupled crossbar structure and unstructured sparsity, a crossbar-aware incrementalstructured pruning algorithm is designedtoachievehigher sparsity and accuracy.