ID 原文 译文
53977 近年来,情感识别成为了人机交互领域的研究热点问题,而多模态维度情感识别能够检测出细微情感变化,得到了越来越多的关注多模态维度情感识别中需要考虑如何进行不同模态情感信息的有效融合。 In recent years, emotion recognition had become a hot research topic in the field of human-computer interaction, and multi-modal dimensional emotion recognition could detect subtle emotional changes, which had attracted more and more attention. In multi-modal emotion recognition, it was necessary to consider how to effectively integrate different modal emotion information.
53978 针对特征层融合存在有效特征提取和模态同步的问题、决策层融合存在不同模态特征信息的关联问题,本文采用模型层融合策略,提出了基于多头注意力机制的多模态维度情感识别方法,分别构建音频模型、视频模型和多模态融合模型对信息流进行深层特征学习,最后放入双向长短时网络中得到最终情感预测值。 Aiming at the problem of effective feature extraction and modal synchronization in feature level fusion, and the correlation problem of different modal feature information in decision level fusion, this paper adopted a model level fusion strategy and proposes a multi-modal dimension emotion recognition method based on Transformer. Respectively constructed audio model, video model and multi-modal fusion model to learn the deep features of the information flow, and finally put it into Bi-directional Long Short Term Memory to obtain the final emotional prediction value.
53979 所提方法相比于不同基线方法在激活度和愉悦度上均取得了最佳的性能,可以在高层维度对情感信息有效捕捉,进而更好的对音视频信息进行有效融合。 Compared with different baseline methods, the proposed method achieves the best performance in terms of arousal and valence, and could effectively capture emotional information in high-level dimensions, and thus better effectively integrate audio and video information.
53980 为了进一步利用源文本数据来提高语音翻译的性能,本文提出了一种基于生成对抗网络的端到端语音翻译算法。 In order to further use the source text data to improve the performance of speech translation, this paper proposes an end-to-end speech translation algorithm based on a generative adversarial network.
53981 通过加入判别网络来判断语音特征序列和文本特征序列的真伪,从而引导生成模型来学习文本真实序列的分布,以使语音序列特征分布更加逼近文本特征序列的分布。 By adding a discriminator network to judge the authenticity of the speech feature sequence and the text feature sequence, and guide the generation model to learn the distribution of the true sequence of the text, so that the feature distribution of the speech sequence can be closer to the distribution of the text feature sequence.
53982 引入了Wasserstein GAN(WGAN)来计算语音特征序列和文本特征序列通过判别器的标量似然值的Earth-Mover(EM)距离,来解决语音特征序列和文本特征序列存在长度不一致的问题。 Wasserstein GAN( WGAN) is introduced to calculate the Earth-Mover( EM) distance of the scalar likelihood values of the speech feature sequences and text feature sequences through the discriminator to solve the problem that the speech feature sequences and text feature sequences have inconsistent lengths. The entire model complies with the training criteria of multi-task learning and adversarial learning.
53983 整个模型遵从多任务学习和对抗学习的训练准则,本文在How2数据集上和MuSTC英中数据集上验证了本文提出算法的有效性, This paper verifies the effectiveness of the proposed algorithm on the How2 dataset and the MuST-C English-Chinese dataset.
53984 该方法可以显著提升翻译质量。 This method can significantly improve the translation quality.
53985 本文研究空域协方差矩阵初始化对复高斯混合模型下的分布式语音分离性能的影响。 In this paper, the impact of the spatial covariance matrix(SCM) initialization on the performance of the distributed speech separation was studied under the complex Gaussian mixture model.
53986 在不同节点的接收信号向量条件独立性假设前提下,推导出一种逐节点迭代更新所有接收信号向量对应的空域协方差矩阵和后验概率等参数的方法; Based on the conditional independence assumption of the recordings of different nodes, the update of the SCM and the posterior probability corresponding to all received signals could be performed per node.