目前整体采用flink on yarn ha 部署,flink版本为社区版1.7.2,hadoop版本为社区版2.8.5
目前总共有5台flink集群,每台服务器CPU4核,内存8G
flink基本配置为 jobmanager.heap.size: 2048m taskmanager.heap.size: 2048m taskmanager.numberOfTaskSlots: 4
采用run a job on flink 启动任务,现在每个任务一个并行度 命令如 flink run -d -m yarn-cluster ...
当发布两个任务成功后,第三个任务就启动不了 部分启动日志如下 360 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1554100483755_0013 2019-04-04 16:24:23,389 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1554100483755_0013 2019-04-04 16:24:23,389 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2019-04-04 16:24:23,390 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2019-04-04 16:25:23,625 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2019-04-04 16:25:23,876 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2019-04-04 16:25:24,127 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster 2019-04-04 16:25:24,378 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster
其他找不到任何跟踪信息,查看yarn 控台后,发现容器分配不了,页面上的信息如下 YarnApplicationState: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Diagnostics: [Thu Apr 04 16:33:49 +0800 2019] Application is added to the scheduler and is not yet activated. Queue's AM resource limit exceeded. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:4096, vCores:1>; User AM Resource Limit of the queue = <memory:4096, vCores:1>; Queue AM Resource Usage = <memory:4096, vCores:2>;
1.按照上面的机器划分跟启动设置并行度,还有yarn控台节点查看,还有很多内存跟CPU没有使用到, 为什么会出现这种情况,是还需要什么配置吗? 2.对于上面几个基本配置,jobmanager.heap.size,taskmanager.heap.size,taskmanager.numberOfTaskSlots有什么设置注意点吗? 一般要怎么设置?我现在发现这种启动模式下,每个任务都会有一个jobmanager跟一个taskmanger*来自志愿者整理的flink邮件归档
Hi, “Queue's AM resource limit exceeded” -> 这个应该是 YARN 对 AM 的使用资源进行了限制吧,上限是 4096M 内存?你启动的应该是 job mode 吧,每个 job 都会启动单独的 AM,每个 AM 占用 2048M 内存?如果按这样算的话确实只够启动两个*来自志愿者整理的flink
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。