开发者社区> 问答> 正文

按照官方文档建立的PAI通用模型训练失败?

我按照这个官方文档建立的模型,然后按照步骤标注后训练,训练失败

https://help.aliyun.com/document_detail/125688.html?spm=a2c0j.103967.1350600..b24f32b2Ae9xRQ

截图 image.png

image.png 报错日志

2020-12-17 15:13:07 INFO Current task status:RUNNING
2020-12-17 15:13:07 INFO Start execute shell on node sh-base-biz-gateway20.cloud.et1.
2020-12-17 15:13:07 INFO Current working dir /home/admin/alisatasknode/taskinfo/20201217/phoenix/15/13/06/lcjnuwjd3tu74vs8u73mkjmd
2020-12-17 15:13:07 INFO Full Command ..
2020-12-17 15:13:07 INFO -------------------------
2020-12-17 15:13:07 INFO /opt/taobao/tbdpapp/paiwrapper/paiservice.sh /home/admin/alisatasknode/taskinfo//20201217/phoenix/15/13/06/lcjnuwjd3tu74vs8u73mkjmd//P1004 1276057991250096 PROD 1004 http://pai-autolearning-cn-shanghai.data.aliyun.com/ fromPaiweb
2020-12-17 15:13:07 INFO -------------------------
2020-12-17 15:13:07 INFO List of passing environment ..
2020-12-17 15:13:07 INFO -------------------------
2020-12-17 15:13:07 INFO SKYNET_ENDPOINT=http://service.odps.aliyun.com/api:
2020-12-17 15:13:07 INFO SKYNET_PTYPE=1002:
2020-12-17 15:13:07 INFO SKYNET_ACTIONID=1:
2020-12-17 15:13:07 INFO SKYNET_RERUN_MODE=1:
2020-12-17 15:13:07 INFO SKYNET_FLOW_PARAVALUE=group:adidas:
2020-12-17 15:13:07 INFO SKYNET_ONDUTY=1276057991250096:
2020-12-17 15:13:07 INFO SKYNET_SYSTEMID=:
2020-12-17 15:13:07 INFO CALC_ENGINE_IDENTIFIER=pai_autodl:
2020-12-17 15:13:07 INFO SKYNET_SOURCEID=700007802971:
2020-12-17 15:13:07 INFO SKYNET_PARAVALUE=1276057991250096 PROD 1004 http://pai-autolearning-cn-shanghai.data.aliyun.com/ fromPaiweb:
2020-12-17 15:13:07 INFO SKYNET_TASKID=707020024517:
2020-12-17 15:13:07 INFO SKYNET_TENANT_ID=336494838098562:
2020-12-17 15:13:07 INFO SKYNET_ID=-1:
2020-12-17 15:13:07 INFO SKYNET_JOBID=700166464480:
2020-12-17 15:13:07 INFO SKYNET_NODENAME=al_sh_1004_v20201217151304533:
2020-12-17 15:13:07 INFO SKYNET_CYCTYPE=0:
2020-12-17 15:13:07 INFO SKYNET_TASK_INPUT={}:
2020-12-17 15:13:07 INFO SKYNET_TIMEZONE=GMT+8:
2020-12-17 15:13:07 INFO SKYNET_EXENAME=:
2020-12-17 15:13:07 INFO IS_NEW_SCHEDULE=true:
2020-12-17 15:13:07 INFO SKYNET_DAGTYPE=4:
2020-12-17 15:13:07 INFO SKYNET_SOURCENAME=group_336494838098562_dev:
2020-12-17 15:13:07 INFO SKYNET_SYSTEM_ENV=prod:
2020-12-17 15:13:07 INFO SKYNET_GMTDATE=20201217:
2020-12-17 15:13:07 INFO SKYNET_ENVTYPE=1:
2020-12-17 15:13:07 INFO SKYNET_BIZDATE=20201216:
2020-12-17 15:13:07 INFO SKYNET_CYCTIME=20201217000000:
2020-12-17 15:13:07 INFO SKYNET_FAILOVER_HANDLER=1:
2020-12-17 15:13:07 INFO SKYNET_CONNECTION=***************:
2020-12-17 15:13:07 INFO SKYNET_DAG_INPUT={}:
2020-12-17 15:13:07 INFO SKYNET_ONDUTY_WORKNO=1276057991250096:
2020-12-17 15:13:07 INFO SKYNET_APP_ID=96215:
2020-12-17 15:13:07 INFO SKYNET_APPNAME=pai_autodl:
2020-12-17 15:13:07 INFO SKYNET_PRIORITY=1:
2020-12-17 15:13:07 INFO KILL_SIGNAL=SIGKILL:
2020-12-17 15:13:07 INFO SKYNET_RERUN_TIME=0:
2020-12-17 15:13:07 INFO SKYNET_REGION=cn-shanghai:
2020-12-17 15:13:07 INFO TASK_PLUGIN_NAME=pai_ml:
2020-12-17 15:13:07 INFO ALISA_TASK_ID=T3_1755553468:
2020-12-17 15:13:07 INFO ALISA_TASK_EXEC_TARGET=group_336494838098562_dev:
2020-12-17 15:13:07 INFO ALISA_TASK_PRIORITY=1:
2020-12-17 15:13:07 INFO --- Invoking Shell command line now ---
2020-12-17 15:13:07 INFO =================================================================
LOGBACK: No context given for ch.qos.logback.classic.encoder.PatternLayoutEncoder@d8355a8
JobId: 1004-707020024517, Worker: null, JCS version: basein, max parallelism: 30
Execution Plan:
____Nodes:
________ #1[odpscmd]
________ #2[odpscmd]
____Dependencies:
________[1] -> [2]
[1] start subJob: #1[odpscmd]
[1] Start OdpsCmdHandler:jobId=1004-707020024517
[1] local log file = /home/admin/alisatasknode/taskinfo//20201217/phoenix/15/13/06/lcjnuwjd3tu74vs8u73mkjmd//T3_1755553468_jcs.log
[1] user accessId :TMP.3Kiny5TtEqjCEkXxGXTWgsiZfYUh6oABE2GB9nMQX7zMZKJZF4aT4yZT6WLSx3s6Y2wwHZwPTiqoG4H9nFefkhvt5uPGi7
[1] execute endpoint : http://service.odps.aliyun.com/api
[1] instance priority : 8
[1] OK
[1] ID = 20201217071313946gzd8z592
[1] Odps Instance Id = 20201217071313946gzd8z592
[1] OdpsInstanceId: 20201217071313946gzd8z592 callback success
[1] Sub Instance ID = 20201217071323863gjfawx7_1586ba6d_9cc9_49ff_90cd_a777f2937f6c
[1] train: 2020-12-17 15:13:25 TensorflowTask_job:0/0/0[0%]
[1] train: 2020-12-17 15:13:31 TensorflowTask_job:0/0/0[0%]
[1] train: 2020-12-17 15:13:37 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:13:43 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:13:48 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:13:54 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:13:59 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:04 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:10 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:15 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:21 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:26 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:31 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:37 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:42 TensorflowTask_job:1/0/1[0%]
[1] train: 2020-12-17 15:14:47 TensorflowTask_job:1/0/1[0%]
[1] OK
[1] Execute Odpscmd Success!
[1] run subJob: #1[odpscmd] successfully!
[2] start subJob: #2[odpscmd]
[2] local log file = /home/admin/alisatasknode/taskinfo//20201217/phoenix/15/13/06/lcjnuwjd3tu74vs8u73mkjmd//T3_1755553468_jcs.log
[2] Submit AutoDL job -- Created: job_id: 1309
[2] AutoDL job failed
[2] ERROR: run subJob: #2[odpscmd] failed!
Run job failed, time taken: 161s
2020-12-17 15:15:52 INFO =================================================================
2020-12-17 15:15:52 INFO Exit code of the Shell command 1
2020-12-17 15:15:52 INFO --- Invocation of Shell command completed ---
2020-12-17 15:15:52 ERROR Shell run failed!
2020-12-17 15:15:52 ERROR Current task status: ERROR
2020-12-17 15:15:52 INFO Cost time is: 162.742s
/home/admin/alisatasknode/taskinfo//20201217/phoenix/15/13/06/lcjnuwjd3tu74vs8u73mkjmd/T3_1755553468.log-END-EOF

展开
收起
游客cj2gubf7fbnqg 2020-12-18 11:57:06 1016 0
0 条回答
写回答
取消 提交回答
问答排行榜
最热
最新

相关电子书

更多
微博机器学习平台架构和实践 立即下载
机器学习及人机交互实战 立即下载
大数据与机器学习支撑的个性化大屏 立即下载