IT-BiCorpus-EN

ID	原文	译文
39196	在此基础上，集成多个卷积神经网络，搭建一个针对立体声音频录音的声音场景分类系统。	Features carrying binaural phase information were therefore extracted. An ensemble of convolution neural networks(CNNs) was adopted as the classifier.
39197	区别于现有声音场景分类系统只使用时频谱的幅度信息，本文所提出的方法保留了立体声音频的相位信息。	Compared to existing works, the ASC system proposed in this paper can generate features with additional phase information and make full use of the advantages of binaural audios.
39198	这使得声学特征中所包含的空间方位信息更丰富，立体声音频的优势得到发挥。	The evaluation results validate that the performance of our ASC system can be improved by taking the binaural phase information into account.
39199	实验结果证明保留立体声相位信息的声音场景分类系统具有更好的性能，在2019年IEEE音频和声学信号处理技术委员会举办的声音场景分类赛事中相比于基线系统的整体识别准确率提升了18.3%。	Our ASC system outperforms the baseline system provide by the 2019 IEEE AASP Challenge Detection and Classification of Acoustic Scenes and Events(DCASE) by 18.3% in terms of the classification accuracy.
39200	基于深度神经网络的低资源条件下关键词检索已经取得了很大的进展，但这些方法仍旧需要较多的参数才能保证模型的精度。	Deep neural network based resource-limited keyword spotting systems have made great progress in recent years, but these methods still need a lot of parameters to get the state-of-the-art performance. In this paper, we focus on the tradeoff between achieving high detection accuracy and having a small model size.
39201	为了进一步减少模型的参数量，本文将Squeeze-and-Excitation网络和深度可分离卷积应用在关键词检索任务中。	We propose to apply Squeeze-and-Excitation network and depthwise separable convolution in keyword spotting task.
39202	首先利用Squeeze-and-Excitation网络对不同特征通道之间的相互依赖关系建模的能力进一步提升模型的精度，	Specifically, We first improve the model performance by explicitly modelling the interdependencies between the channels of convolutional features with a so-called squeeze-and-excitation network.
39203	然后通过将标准卷积替换为深度可分离卷积来有效的减少模型所需要的参数。	Then, we replace the standard convolution with the depthwise separable convolution, which greatly reduces the number of parameters of the standard convolution.
39204	在谷歌语音命令数据集上的实验证明我们的模型可以在保证高精度的同时把参数量限制在一定的范围内。	We compared the proposed method with two convolutional neural network based models on Google Speech Commands dataset. Experimental results show that the proposed method significantly outperforms the comparison methods in terms of detection accuracy and model size. For example, it achieves a detection accuracy of 96.16% with a number of parameters of 75.5 K, which significantly outperforms the comparison methods.
39205	多数情况下，音频信号可以视为是由稳态成分和突变成分两种成分组成，稳态成分与突变成分在属性特征方面具有明显的差异，	Under many conditions, audio signals could be consisted of composed of steady components and mutant components these two components and there was a clear difference between steady components and mutant components on features.