我启动了yarn-session:bin/yarn-session.sh -jm 1g -tm 4g -s 4 -qu root.flink -nm fsql-cli 2>&1 &
然后通过sql-client,提交了一个sql:
主要逻辑是将一个kafka表和一个hive维表做join,然后将聚合结果写到mysql中。
运行过程中,经常出现短则几个小时,长则几十个小时后,任务状态变为succeeded的情况,如图:https://s1.ax1x.com/2020/06/29/Nf2dIA.png
日志中能看到INFO级别的异常,15:34任务结束时的日志如下: 2020-06-29 14:53:20,260 INFO org.apache.flink.api.common.io.LocatableInputSplitAssigner - Assigning remote split to host uhadoop-op3raf-core12 2020-06-29 14:53:22,845 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: HiveTableSource(vid, q70) TablePath: dw.video_pic_title_q70, PartitionPruned: false, PartitionNums: null (1/1) (68c24aa5 9c898cefbb20fbc929ddbafd) switched from RUNNING to FINISHED. 2020-06-29 15:34:52,982 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting YarnSessionClusterEntrypoint down with application status SUCCEEDED. Diagnostics null. 2020-06-29 15:34:52,984 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Shutting down rest endpoint. 2020-06-29 15:34:53,072 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Removing cache directory /tmp/flink-web-cdb67193-05ee-4a83-b957-9b7a9d85c23f/flink-web-ui 2020-06-29 15:34:53,073 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://uhadoop-op3raf-core1:44664 lost leadership 2020-06-29 15:34:53,074 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Shut down complete. 2020-06-29 15:34:53,074 INFO org.apache.flink.yarn.YarnResourceManager - Shut down cluster because application is in SUCCEEDED, diagnostics null. 2020-06-29 15:34:53,076 INFO org.apache.flink.yarn.YarnResourceManager - Unregister application from the YARN Resource Manager with final status SUCCEEDED. 2020-06-29 15:34:53,088 INFO org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl - Waiting for application to be successfully unregistered. 2020-06-29 15:34:53,306 INFO org.apache.flink.runtime.entrypoint.component.DispatcherResourceManagerComponent - Closing components. 2020-06-29 15:34:53,308 INFO org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess - Stopping SessionDispatcherLeaderProcess. 2020-06-29 15:34:53,309 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopping dispatcher akka.tcp://flink@uhadoop-op3raf-core1:38817/user/dispatcher. 2020-06-29 15:34:53,310 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Stopping all currently running jobs of dispatcher akka.tcp://flink@uhadoop-op3raf-core1:38817/user/dispatcher. 2020-06-29 15:34:53,311 INFO org.apache.flink.runtime.jobmaster.JobMaster - Stopping the JobMaster for job default: insert into rt_app.app_video_cover_abtest_test ... 2020-06-29 15:34:53,322 INFO org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl - Interrupted while waiting for queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287) 2020-06-29 15:34:53,324 INFO org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy - Opening proxy : uhadoop-op3raf-core12:23333
ps:
谢谢大家!
*来自志愿者整理的flink邮件归档
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。