samedi 18 juin 2016

Pandas Dataframe: Splitting a list of predictions into dataframe columns


I am using a scikit-learn model (kdtree) to make predictions.

Originally, I was trying to only take the top 3 which is done using the following code given a fitted model (clf), list of labels (labels), and test data (test):

_,neighbors = clf.query(select_columns(test),k=30)
top3 = [Counter([labels[idx] for idx in neighborSet]).most_common(3) for neighborSet in neighbors]
predict = [[x for x,_ in  idx] for idx in top3]
preds = pd.DataFrame()
preds['predict1'],preds['predict2'],preds['predict3'] = [x[0] for x in predict],[x[1] if len(x) > 1 else x[0] for x in predict],[x[2] if len(x) > 2 else x[0] for x in predict]

I am trying to generalize this to n predictions. What is the pythonic way to split up n predictions into n columns without specifically calling each prediction name.

Ideally, I would like to do something like this:

_,neighbors = clf.query(select_columns(test),k=30)
topn = [Counter([labels[idx] for idx in neighborSet]).most_common(n) for neighborSet in neighbors]
predict = [[x for x,_ in  idx] for idx in topn]
preds = pd.DataFrame()
preds['predict'+str(i)] for i in range(n) = [x[i] if len(x) > i else x[0] for x in predict]

I am looking for a way to efficiently split up a list of items into separate columns in a dataframe.


Aucun commentaire:

Enregistrer un commentaire