val schema = df.schema
val x = df.flatMap(r =>
(0 until schema.length).map { idx =>
((idx, r.get(idx)), 1l)
}
)
这会产生错误
java.lang.ClassNotFoundException: scala.Any
一种方法是将所有列强制转换为String。请注意,我正在将代码中的r.get(idx)更改为r.getString(idx)。以下工作。
scala> val df = Seq(("ServiceCent4","AP-1-IOO-PPP","241.206.155.172","06-12-18:17:42:34",162,53,1544098354885L)).toDF("COL1","COL2","COL3","EventTime","COL4","COL5","COL6")
df: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 5 more fields]
scala> df.show(1,false) | ||||||
---|---|---|---|---|---|---|
COL1 | COL2 | COL3 | EventTime | COL4 | COL5 | COL6 |
ServiceCent4 | AP-1-IOO-PPP | 241.206.155.172 | 06-12-18:17:42:34 | 162 | 53 | 1544098354885 |
only showing top 1 row
scala> df.printSchema
root
|-- COL1: string (nullable = true)
|-- COL2: string (nullable = true)
|-- COL3: string (nullable = true)
|-- EventTime: string (nullable = true)
|-- COL4: integer (nullable = false)
|-- COL5: integer (nullable = false)
|-- COL6: long (nullable = false)
scala> val schema = df.schema
schema: org.apache.spark.sql.types.StructType = StructType(StructField(COL1,StringType,true), StructField(COL2,StringType,true), StructField(COL3,StringType,true), StructField(EventTime,StringType,true), StructField(COL4,IntegerType,false), StructField(COL5,IntegerType,false), StructField(COL6,LongType,false))
scala> val df2 = df.columns.foldLeft(df){ (acc,r) => acc.withColumn(r,col(r).cast("string")) }
df2: org.apache.spark.sql.DataFrame = [COL1: string, COL2: string ... 5 more fields]
scala> df2.printSchema
root
|-- COL1: string (nullable = true)
|-- COL2: string (nullable = true)
|-- COL3: string (nullable = true)
|-- EventTime: string (nullable = true)
|-- COL4: string (nullable = false)
|-- COL5: string (nullable = false)
|-- COL6: string (nullable = false)
scala> val x = df2.flatMap(r => (0 until schema.length).map { idx => ((idx, r.getString(idx)), 1l) } )
x: org.apache.spark.sql.Dataset[((Int, String), Long)] = [_1: struct<_1: int, _2: string>, _2: bigint]
scala> x.show(5,false) | |
---|---|
_1 | _2 |
[0,ServiceCent4] | 1 |
[1,AP-1-IOO-PPP] | 1 |
[2,241.206.155.172] | 1 |
[3,06-12-18:17:42:34] | 1 |
[4,162] | 1 |
only showing top 5 rows
scala>
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。