ID |
原文 |
译文 |
56848 |
识别海量变量间潜在的复杂关联关系,判断不同形式关联关系的强弱,是大数据关联关系挖掘的重要任务之一. |
Important tasks in big data association mining are identification of potentially complex associationsamong massive variables and determination of the strength of different forms of associations. |
56849 |
然而,数据分布的不确定性、关联关系的多样性,使得基于分布假设的关联关系度量和基于数据驱动的非参数度量方法的适用性、准确性难以保证. |
However, uncertaindata distributions and diverse associations make it difficult to ensure the applicability and accuracy of measuresbased on distribution assumptions and data-driven non-parametric measurement methods. |
56850 |
因此,设计一种对关联关系形式无偏的有效关联度量方法变得至关重要. |
Therefore, an effectiveassociation measure that is unbiased relative to relationship types is urgently needed. |
56851 |
本文从大数据背景下潜在关联关系应被公平排序的需求出发,回顾了目前关联度量的公理化条件,给出了大数据关联关系度量可能需满足的性质;讨论了两类基于邻域视角的度量方法存在的不足;提出了本文基于k-NN粒的关联度量方法,称为最大邻域系数. |
In this article, starting fromthe fair ordering requirement of potential relationships in big data, we review the current axiomatic conditionsof association metrics, provide some possible properties that association measures in big data should satisfy,discuss some limitations of two types of association methods based on neighborhood perspective, and proposea new association measure based on k-NN granule, which we refer to as maximum neighborhood coefficient. |
56852 |
人造数据集和真实数据集实验从不同角度验证了本文所提方法的有效性和优越性. |
Experiments using artificial and real datasets verify the effectiveness and superiority of the proposed methodfrom different perspectives. |
56853 |
最后指出了实验中发现的有趣现象和有待解决的理论问题,以引起对该领域更深入的思考和研究. |
Finally, we identify interesting phenomena in the experiment and theoretical issuesto be solved that we hope will motivate deeper thinking and research in this field. |
56854 |
在基于深度网络的自然语言处理任务中,嵌入表示层用词向量刻画词的语义信息,可以有效地提升模型性能. |
When deep learning is applied to natural language processing, a word embedding layer can improvetask performance significantly due to the semantic information expressed in word vectors. |
56855 |
词向量可以和当前任务一起端到端地进行学习,但是从模型参数数量的角度来看,词向量的训练很容易在小语料库上过拟合. |
Word embeddingscan be optimized end-to-end with the whole framework. However, considering the number of parameters in aword embedding layer, in tasks with a small corpus, the training set can easily be overfitted. |
56856 |
为了解决这个问题,通常会使用在大语料库上预训练得到的词向量. |
To solve thisproblem, pretrained embeddings obtained from a much larger corpus will be utilized to boost the current modelperformance. |
56857 |
首先,本文总结了几种常见的复用预训练词向量的方法. |
This paper summarizes several methods to reuse pretrained word embeddings. |