ID |
原文 |
译文 |
56888 |
实验表明:(1)在Apache Calcite上,与一系列剪枝的启发式算法相比, RLO搜索计划的效率为它们的1056倍,并且生成的计划能更快地执行(80%的加速);(2)与原生的Postgres相比, RLO搜索计划的效率是其14倍,并且在端到端的执行中达到12. |
Extensive experimentsdemonstrate that: (1) Apache Calcite RLO is 10×–56× faster in finding the execution plan and 80% faster inexecuting the plan than the state-of-the-art heuristics. |
56889 |
9%的加速. |
(2) Compared with the native Postgres implementation,RLO can be 14× faster in finding the execution plan and 12. 9% faster in an end-to-end comparison. |
56890 |
针对"信息孤岛"中的关系数据融合问题,本文提出并实现了多源关系数据融合的基本框架(multi-source relational data fusion, MSF). |
Focusing on the problem of relational data fusion in the environment with “information isolatedisland”, this paper presents a multi-sources relational data fusion (MSF) framework. |
56891 |
框架包含3个主要模块:模式匹配、实体对齐、实体融合. |
The framework consists ofthree components: schema matching, entity alignment, and entity fusion. |
56892 |
模式匹配面向多源关系数据的属性对齐问题,结合属性值的多维特征,提出基于匈牙利(Hungarian)算法的属性间对齐发现机制,实现了多源关系数据的快速模式匹配.实体对齐连接多源关系中的元组对,通过引入多样性取样策略和实体特征抽取方法,提升了实体对齐的效果. |
Based on the Hungarian algorithm, wepropose an alignment discovery mechanism for the attributes alignment among multi-sources relational data. Byextracting the multi-dimensional features of attribute values, we efficiently realized schema matching of multi?sources relational data. |
56893 |
最后将对齐实体进行融合,为数据分析提供统一的数据视图. |
To link the tuple pairs from multi-source data, we introduced the diversity samplingstrategy and the entity feature extraction approach. |
56894 |
为了验证MSF的效果和效率,实现了数据融合系统DataPuzzle,并在该系统上,结合真实公开的多领域数据,对提出的方法进行了验证. |
These can effectively improve the performance of entityalignment. Finally, linked entities are fused to provide a unified view of data analysis. To verify the usefulnessand efficiency of the proposed methods, we implemented a fusion system called Data Puzzle, which is verifiedwith the real public multi-field data. |
56895 |
结果表明,所提出的方法可以高效地实现数据融合,具有较高的查全率、查准率. |
Experimental results demonstrate that the proposed methods can fusemulti-source relational data efficiently with high recall and precision. |
56896 |
机器学习依赖大量样本的统计信息进行模型的训练,从而能对未知样本进行精准的预测. |
Although achieve inspiring performance in many real-world applications, machine learning methodsrequire a huge amount of training examples to obtain an effective model. |
56897 |
搜集样本及标记需要耗费大量的资源,因而如何基于少量样本(few-shot learning)进行模型的训练至关重要. |
Considering the effort collecting labeledtraining data, the few-shot learning, i. e. , learning with budgeted training set, is necessary and useful. |