阿里云E-MapReduce中在spark里使用jingdo访问oss, 出现400 请求错误, 是啥原因? spark-3.4.2, jindosdk-6.3.0. 谢谢。是版本问题么?
24/02/20 13:40:57 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
24/02/20 13:40:57 INFO FsStats: cmd=mkdir, src=oss://dataplatform/hudi_test10/_temporary/0, dst=null, size=0, parameter=null, time-in-ms=21, version=6.3.0
Traceback (most recent call last):
File "/data/software/src/cluster_test.py", line 85, in
df.write.format("csv").mode("append").save(basePath)
File "/data/software/spark-3.4.2-bin-hadoop3/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1398, in save
File "/data/software/spark-3.4.2-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1323, in call
File "/data/software/spark-3.4.2-bin-hadoop3/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 169, in deco
File "/data/software/spark-3.4.2-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o58.save.
: java.io.IOException: ErrorCode : 6400, ErrorMessage : [RequestId]: 65D43B698EC94332378FC136 [HostId]: ... [ErrorMessage]: [E1010]HTTP/1.1 400 Bad Request: <?xml version="1.0" encoding="UTF-8"?> BadRequest
Your browser sent a request that this server could not understand. 65D43B698EC94332378FC136 localhost [ErrorCode]: 1010 [RequestId]: 65D43B698EC94332378FC136
at com.aliyun.jindodata.call.JindoMkdirCall.execute(JindoMkdirCall.java:57)
at com.aliyun.jindodata.common.JindoHadoopSystem.mkdirs(JindoHadoopSystem.java:700)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2388)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:356)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:188)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:269)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
一般oss endpoint类似oss-cn-shanghai-internal.aliyuncs.com,oss-hdfs的话类似cn-shanghai.oss-dls.aliyuncs.com
首先结尾不带斜杠,我也没见过类似xxx.ops.xxx.cn这种的,这个是贵公司内部的域名?那可能是因为6.3.0我们升级了oss签名算法到signer v4导致的,你用的oss python sdk估计也还没支持。
升级签名算法后,需要指定一个fs.oss.region(类似cn-shanghai)。如果没配置,而fs.oss.endpoint又是上述标准的域名,我们自动提取了域名中的region信息。
你这种情况只能在配置中自行指定fs.oss.region,或者指定低版本的签名算法配置fs.oss.signer.version为1。
设置好之后,建议还是先跑下hadoop fs 命令 ,此回答整理自钉群“JindoData 用户交流群”
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
阿里云EMR是云原生开源大数据平台,为客户提供简单易集成的Hadoop、Hive、Spark、Flink、Presto、ClickHouse、StarRocks、Delta、Hudi等开源大数据计算和存储引擎,计算资源可以根据业务的需要调整。EMR可以部署在阿里云公有云的ECS和ACK平台。