IT-BiCorpus-EN

ID	原文	译文
57828	在此基础上，引入了动态路由算法，通过迭代路由过程来确定主胶囊和数字胶囊之间的归属关系，进而得到一组数字胶囊，其中，每个数字胶囊可以学习识别目标行人的存在.	After that，a dynamic routing algorithm which is an iterative routing process，is introduced to decide the attribution between primary capsule and digital capsule. To this end，the digital capsule layer is obtained and each capsule can learn to recognize the presence of persons.
57829	在具有挑战性的数据集上进行实验的结果表明，所提算法在性能上优于已有算法.	To highlight the superiorities of the proposed algorithm，extensive experiments are conducted on a series of challenging datasets and show that the algorithm favorably performs against the previous work.
57830	垃圾网页检测存在数据不平衡、特征空间维度较高的问题，为此，提出一种基于随机混合采样和遗传算法的集成分类算法.	Spam web detection is of ten troubled by the problem of unbalanced data and high feature space dimension. In order to solve these two problems，the ensemble classification algorithm based on random hybrid-sampling and genetic algorithm was proposed.
57831	首先，使用随机混合采样技术，通过随机抽样，减少多数类样本数量，用少数类样本合成过采样技术方法生成少数类样本，获得多个平衡的训练数据子集;	Firstly，a number of balanced training data subsets is obtained by reducing the number of majority samples through random sampling and generating minority samples by synthetic minority over － sampling technique( SMOTE) method.
57832	然后使用改进的遗传算法对训练数据集进行降维，得到多个具有最优特征的训练数据子集;	Then，the improved genetic algorithm is used to reduce the dimension of training data set to obtain multiple subsets of training data with optimal feature.
57833	使用极端梯度算法( XGBoost) 作为分类器，训练多个平衡数据子集，用简单投票法对多个分类器进行集成，得到新的分类器;	Extreme gradient boosting( XGBoost) is also used as the classifier to train mul- tiple balanced data subsets，and so a new classifier is obtained by ensemble multiple classifiers with sim- ple voting method.
57834	最后对测试集进行预测，得到最终预测结果.	Finally，the test set is predicted and the final prediction is obtained.
57835	实验结果表明，提出算法的分类结果与 XGBoost 的结果相比，准确率提高了约 19.	Experiments show that，compared with XGBoost，the proposed algorithm improves the accuracy by about 19.
57836	25% ，且减少了建立学习模型的时间，提高了分类性能，是一种较好的分类算法.	25% ，re- duces the time to build the learning model，and improves the classification performance.
57837	为便于厘清机器阅读理解任务的研究现状，按照答案来源，将机器阅读理解分为完形填空、片段选择、多项选择和答案生成 4 类.	In order to make clear the recent work of machine reading comprehension ( MＲC) tasks，they are divided into four types of subtasks with different sources of answers. They are cloze-style，span selec- tion，multi-choice，and answer generation.