mercredi 15 juin 2016

pyspark: keep a function in the lambda expression


I have the following working code:

def replaceNone(row):
  myList = []
  row_len = len(row)
  for i in range(0, row_len):
    if row[i] is None:
      myList.append("")
    else:
      myList.append(row[i])
  return myList

rdd_out = rdd_in.map(lambda row : replaceNone(row))

Here row is from pyspark.sql import Row

However, it is kind of lengthy and ugly. Is it possible to avoid making the replaceNone function by writing everything in the lambda process directly? Or at least simplify replaceNone()? Thanks!


Aucun commentaire:

Enregistrer un commentaire