I have the following working code:
def replaceNone(row):
myList = []
row_len = len(row)
for i in range(0, row_len):
if row[i] is None:
myList.append("")
else:
myList.append(row[i])
return myList
rdd_out = rdd_in.map(lambda row : replaceNone(row))
Here row
is from pyspark.sql import Row
However, it is kind of lengthy and ugly. Is it possible to avoid making the replaceNone
function by writing everything in the lambda process directly? Or at least simplify replaceNone()? Thanks!
Aucun commentaire:
Enregistrer un commentaire