ID 原文 译文
2573 首先,将每篇文档作为一个序列库,利用 SPING(Sequential Patterns mIning withoNe-off and General gaps condition)算法获取词语之间的关系及其多种变化形式,并利用统计模式特征的方式描述候选关键词; Taking into account one off condition and general gaps, SPING (Sequential PatternsmIning with oNe-off and General gaps condition)can catch semantic relations between words and phrases more effectively. Therefore, KEING will get effective candidate keyphrases and count their features.
2574 然后,通过朴素贝叶斯分类算法对大量带标记的训练数据进行训练,构造分类器;最后利用分类器从测试文档中识别出关键词。 Then a supervised machine learning meth-od is used to train features and construct a classification model, we can extract keyphrase with this model.
2575 通过实验验证了 SPING 算法的完备性以及 KEING 算法的有效性。 Experimental re-sults demonstrate KEING can effectively extract high quality keyphrases.
2576 针对谱聚类算法 self-tuning 的局部尺度参数 σi会受噪音点影响,进而影响聚类结果,及其所使用的 K-means 算法的不稳定,对聚类结果的影响,提出两种完全自适应的谱聚类算法 SC-SD(Spectral Clustering based onStandard Deviation)和 SC-MD(Spectral Clustering based on Mean Distance), To avoid the clustering results with the local scaling parameter σiof self-tuning may be influenced by outli-ers, and the unstable clustering results from K-means in self-tuning, two true self-adaptive spectral clustering algorithms wereproposed. The two spectral clustering algorithms are respectively named as SC-SD(Spectral Clustering based on StandardDeviation)and SC-MD(Spectral Clustering based on Mean Distance).
2577 分别定义样本 i 的标准差、样本 i 到其余样本的距离均值,为样本 i 的邻域半径,统计邻域内的样本数,以样本 i 的邻域标准差为其局部尺度参数,避免样本 i 的局部尺度参数受噪音点影响,进而影响聚类结果; They respectively define the standard deviation ofpoint i, and the mean distance from point i to others, as its radius of neighborhood, then count the number of points in the neighborhood, and use the standard deviation of point i in the neighborhood as its local scaling parameter, so as to avoid theinfluence from outliers to the local scaling parameter σiof point i, and the distortion in clustering results of self-tuning.
2578 以方差优化初始聚类中心的 SD-K-medoids 算法代替 K-means 算法,克服 K-means 算法的不稳定,发现数据的真实分布。 SD-K-medoids are adopted to instead of K-means in self-tuning to avoid the unstable clustering results of K-means, so as to get the true clustering of a dataset.
2579 UCI 数据集和人工数据集实验测试表明,提出的 SC-SD SC-MD 算法能得到更优聚类结果,不受噪音点影响,有很好的伸缩性。 The experimental results on UCI datasets and on synthetic datasets demonstrate that SC-SDand SC-MD can obtain better clustering results than that of traditional spectral clustering algorithm NJW and spectral cluste-ring algorithm self-tuning, and are robust to noises, and has got good scalability.
2580 提出的 SC-SD SC-MD 能完全自适应地发现数据集的真实分布信息,尤其 SC-MD 算法很适合较大规模数据集的聚类分析。 The proposed SC-SD and SC-MD can de-tect the clustering of a dataset without any given information, and the SC-MD can be used to detect the clustering of a com-parable big data.
2581 目前,基于计算机视觉分析的图像场景分类技术已被广泛研究并应用在众多学科领域中。 The computer vision based scene classification technology is widely developed and applied in differentfields.
2582 本文从不同角度对近年来典型的场景分类技术进行了深入的探讨与比较。 In this paper, the typical scene classification technology is analyzed and compared from the different directions.