I have a data frame as below
words group_id
0 set([a, c, b, d]) 1
1 set([a, b]) 2
2 set([h, e, g, f]) 3
I need to group the rows into one even if one word in the set(words) overlaps with a word in set of other row and update the group_id.
words group_id
0 set([a, c, b, d]) 1
1 set([a, b]) 1
2 set([h, e, g, f]) 3
I tried this way
word_frequency = Counter()
for val in df['words'].values:
word_frequency.update(val)
to_return = np.array(word_frequency.most_common())
count = 1
df['group_id'] = np.zeros(len(df)) * np.nan
for val in to_return:
df['group_id'] = df[['group_id','words']].apply(lambda x: count if (val in x) else np.NAN)
count += 1
How can I do that?
Aucun commentaire:
Enregistrer un commentaire