IT-BiCorpus-EN

ID	原文	译文
38836	人体动作识别在人机交互、视频内容检索等领域有众多应用，是多媒体信息处理的重要研究方向。	Human action recognition has many applications in the fields of human-computer interaction and video content retrieval, and is an important research direction of multimedia information processing.
38837	现有的大多数基于双流网络进行动作识别的方法都是在双流上使用相同的卷积网络去处理RGB与光流数据，缺乏对多模态信息的利用，容易造成网络冗余和相似性动作误判问题。	Most existing methods for action recognition based on two-stream network use the same convolutional network to process RGB and optical flow data on two streams, lacking the use of multimodal information, which is easy to cause network redundancy and misjudgment of similar actions.
38838	近年来，深度视频也越来越多地用于动作识别，但是大多数方法只关注了深度视频中动作的空间信息，没有利用时间信息。	In recent years, depth video is also increasingly used for action recognition, but most methods only focus on the spatial information of the action in the depth video, without using temporal information.
38839	为了解决这些问题，本文提出一种基于异构多流网络的多模态动作识别方法。	To solve these problems, a multimodal action recognition method based on heterogeneous multi-stream network is proposed.
38840	该方法首先从深度视频中获取动作的时间特征表示，即深度光流数据，然后选择合适的异构网络来进行动作的时空特征提取与分类，最后对RGB数据、RGB中提取的光流、深度视频和深度光流识别结果进行多模态融合。	Firstly, the temporal features of the action in the depth video are obtained, namely the depth optical flow. Then appropriate heterogeneous networks are selected for spatiotemporal feature extraction and classification of actions. Finally, multimodal fusion is performed on RGB, the optical flow extracted from RGB, Depth and the optical flow extracted from Depth.
38841	通过在国际通用的大型动作识别数据集NTU RGB+D上进行的实验表明，所提方法的识别性能要优于现有较先进方法的性能。	Experiments on the international dataset NTU RGB+D show that the performance of the proposed method in human action recognition is better than that of the existing advanced model.
38842	连续手语识别的难点之一是手语数据中存在时空维度的冗余信息，以及手语数据与给定标签序列的对齐问题。	One of the difficulties in continuous sign language recognition is the redundant information in the spatio-temporal dimension of the sign language data, and the alignment of the sign language data with a given label sequence.
38843	因此，本文提出一种融合注意力机制和连接时序分类的连续手语识别模型，可以提取手语数据中彩色和深度视频片段的短期时空特征以及手部运动轨迹特征，	Therefore, we propose a sign language sentence recognition model that combines attention mechanism and connected temporal classification, which can extract short-term spatio-temporal features of color and depth video segments and hand motion trajectories in sign language data.
38844	将三种模态的特征融合后使用空间注意力加权并按照时间顺序输入到双向长短期记忆网络中进行时序建模，以获取长期时空特征，	To obtain the long-term spatio-temporal features, the features of the three modals are fused and weighted using spatial attention, then input into the bidirectional long short term memory network in time sequence for time series modeling.
38845	最后利用融合注意力机制和连接时序分类模型的解码网络以端到端的方式实现连续手语的准确识别。	Finally, decoder network that integrates the attention mechanism and the connection temporal classification model is used end-to-end to achieve accurate recognition of continuous sign language.