ID 原文 译文
39246 由于水声信号的高度复杂性,基于特征工程的传统水下目标识别方法表现欠佳。 Traditional feature-based underwater target recognition methods perform poorly due to the high complexity of underwater acoustic signals.
39247 基于深度学习模型的水下目标识别方法可有效减少由于特征提取过程带来的水声信号信息损失,进而提高水下目标识别效果。 Advanced recognition methods based on the deep learning model can effectively reduce the information loss caused by the feature extraction, thereby improving the classification performance.
39248 本文提出一种适用于水下目标识别场景的卷积神经网络结构,即在卷积模块化设计中引入卷积核为1的卷积层,更大程度地保留水声信号局部特征,且降低模型的复杂程度; In this paper, we proposed a convolutional neural network(CNN) model suitable for the underwater targets recognition scenario, which introduced a one-dimension Convolution layer with the kernel of 1 in the convolution module to preserve the local characteristics of underwater acoustic signals and reduce the complexity of the model;
39249 同时,以全局平均池化层替代全连接层的方式构造基于特征图对应的特征向量主导分类结果的网络结构,使结果更具可解释性,且减少训练参数降低过拟合风险。 meanwhile, replaced the fully connected layer with a global average pooling(GAP) layer which outputted the interpretable results based on the feature vector corresponding to feature map and reduced the training parameters to prevent overfitting.
39250 实验结果表明该方法得到的水下目标识别准确率(91.7%)要优于基于传统卷积神经网络(69.8%)和基于高阶统计量特征的传统方法识别表现(85%)。 The results showed that the modified CNN model achieved a classification accuracy of 91.7%, compared with the classification method based on conventional CNN which obtained 69.8% and features of higher-order statistics(HOS) which obtained 85%.
39251 这说明本文提出的模型能更好保留水声信号的时域结构,进而提高分类识别效果。 It is concluded that the proposed method can better preserve the time-domain structure of underwater acoustic signals, furthermore improving the classification performance.
39252 跨模态检索旨在通过以某一模态的数据为查询词,使人们能够得到与之相关的其他不同模态数据的检索结果的新型检索方法,这已成为多媒体和信息检索领域中一个有趣的研究问题。 Cross-modal retrieval aims to retrieve data in one modality by a query in another modality, which has been an interesting research issue in the field of multimedia and information retrieval.
39253 但是,目前大多数的研究成果集中于文本到图像、文本到视频以及歌词到音频等跨模态相关任务上,而关于如何为特定的视频通过跨模态检索得到合适的音乐这一跨模态的相关研究却很有限。 However, most existing works focus on tasks of text to image, text to video, and lyrics to audio, limited research has been conducted on cross-modal retrieval of suitable music for a specified video or vice versa.
39254 此外,大多现有的关于视频和音频跨模态的研究依赖于元数据(例如关键字,标签或描述)。 Moreover, much of the existing research relies on metadata such as keywords, tags, or description.
39255 本文介绍了一种基于音频和视频这两种模态数据内容的跨模态检索的方法,该方法以新型的双流处理网络为框架,并通过神经网络学习两模态数据在公共子空间的特征表达,以计算音频和视频数据之间的相似度。 This paper introduces a method based on the content of audio and video data modalities implemented with a novel two-branch neural network is to learn the joint embeddings from a shared subspace for computing the similarity between audio and video data modalities.