ID 原文 译文
56728 用户应用被分配的内存空间不足,会在运行期间产生严重的垃圾回收(garbage collect, GC)开销,而分配过量的内存会导致平台资源的浪费. If a user application is allocated with insufficient memory, significant garbagecollection overheads will occur at runtime. In contrast, if the user application is provided with a memory spacelarger than it actually requires, memory resources will be wasted.
56729 因此平台中如何为用户应用配置合适的内存成为关键问题. Therefore, it is important to ensure that a userapplication is allocated with an appropriate memory size.
56730 通过研究分析发现,平台上的多个应用会多次共同处理某个特定的数据集,且应用对数据的处理逻辑具有相似性,如图计算和机器学习应用; In a general case, multiple applications process onespecific set of data repeatedly. Often, the process logic of applications working on the same dataset is similar;for example, they can perform machine learning or graph computing tasks.
56731 大数据应用框架的算子API和用户自定义方法 (UDF)与数据处理逻辑有着密切的关系,继而影响运行时内存的使用. This study further reveals that theprocess logic reflected in application programming interface and user-defined functions also affects the memoryusage at runtime.
56732 基于该发现,本文提出了一种预估新提交应用的合理内存阈值的方法. Based on this observation, this paper presents a method for estimating the optimal memory sizefor newly submitted applications.
56733 该方法利用程序分析与历史应用处理数据特征分析,基于kTree判断与历史应用的数据路径的相似性来预估新应用的合理内存阈值,并在Spark系统上实现该方法. The proposed approach was implemented on Spark. It utilizes the informationof program analysis and historical applications produced when processing data to estimate a proper memorythreshold for a newly submitted application based on the similarity of the data path between the new applicationand a historical one.
56734 通过一系列实验评估预估的准确性和性能收益,实验结果表明本方法预估大数据应用的结果与真实合理内存阈值的误差比例低至4%,预估过程所产生的开销与应用运行时间相比可以忽略不计,平台上数据处理应用整体执行时间减少至56%. The results of the experiments conducted to evaluate the method’s accuracy of estimatingthe memory threshold and performance profit demonstrated that the proposed method is able to (1) estimate therequired memory threshold with an error of 4% compared to the actual proper memory threshold, (2) guaranteethe overall time overhead of estimating to be negligible compared to the execution time of a submitted job, and(3) reduce the execution time of submitted applications by up to 56% compared to when the proposed method isnot applied.
56735 网络节点表示学习是网络数据分析挖掘中的一个基础问题,通过学习网络节点表示向量,可以更加精准地对网络节点进行表征. Network representation learning is a basic problem in network data analysis.
56736 近年来,随着深度学习的发展,嵌入方法在网络节点表示学习方面得到了广泛应用. By learning networkrepresentation vectors, network vertices can be represented more accurately.
56737 同时,网络数据在规模、模态等特征方面也有了很大的变化,研究重点从单网络分析挖掘逐渐演变至耦合网络分析挖掘. With the development of deeplearning, embedding methods have been widely used for network vertex representation learning.