I am using a scikit-learn model (kdtree) to make predictions.
Originally, I was trying to only take the top 3 which is done using the following code given a fitted model (clf), list of labels (labels), and test data (test):
_,neighbors = clf.query(select_columns(test),k=30)
top3 = [Counter([labels[idx] for idx in neighborSet]).most_common(3) for neighborSet in neighbors]
predict = [[x for x,_ in idx] for idx in top3]
preds = pd.DataFrame()
preds['predict1'],preds['predict2'],preds['predict3'] = [x[0] for x in predict],[x[1] if len(x) > 1 else x[0] for x in predict],[x[2] if len(x) > 2 else x[0] for x in predict]
I am trying to generalize this to n predictions. What is the pythonic way to split up n predictions into n columns without specifically calling each prediction name.
Ideally, I would like to do something like this:
_,neighbors = clf.query(select_columns(test),k=30)
topn = [Counter([labels[idx] for idx in neighborSet]).most_common(n) for neighborSet in neighbors]
predict = [[x for x,_ in idx] for idx in topn]
preds = pd.DataFrame()
preds['predict'+str(i)] for i in range(n) = [x[i] if len(x) > i else x[0] for x in predict]
I am looking for a way to efficiently split up a list of items into separate columns in a dataframe.
Aucun commentaire:
Enregistrer un commentaire