lundi 27 juin 2016

Drastic reduction in speed when list formation transferred to function


I wrote a code that is supposed to do some operation on a sentence of a file and elements of two lists - keywords and keywords2. It is as follows -

import os
keywords=['a','b']
keywords2=['c','d mvb']

def foo(sentence,k2):

    gs_list=[]                       #####
    for k in keywords:               #####    
        if k in sentence:            #####
            gs_list.append(k)        #####

    for k in gs_list:
        if (k in sentence) and (k2 in sentence):
            print 'a match'
    return 4

for path, dirs, files in os.walk(r'F:M.Techfor assigning clselectedrandom 100'):
    for file in files:
        sentences=open(file).readlines();
        for sentence in sentences:
            if sentence.startswith('!series_title'):      
                for k2 in keywords2:
                    foo(sentence,k2)

I have marked the part of the code in question. This piece (let's call it BETA) basically forms a list of keywords which are in the sentence selected. As a result, future operations have to be performed using only these keywords.

This code takes approximately 47 seconds to run 100 files. Now I was trying to think of a way to speed it up. There are ~50 elements in keywords2. So I thought I am basically running BETA 50 times by having it inside the function func when all I need for it is the list keywords and the sentence. I do have both of these already in the main code so I transferred this part to the main part of the code -

import os
keywords=['a','b']
keywords2=['c','d mvb']

def foo(sentence,k2):

    for k in gs_list:
        if (k in sentence) and (k2 in sentence):
            print 'a match'
    return 4

for path, dirs, files in os.walk(r'F:M.Techfor assigning clselectedrandom 100'):
    for file in files:
        sentences=open(file).readlines();
        for sentence in sentences:
            if sentence.startswith('!series_title'):   

                gs_list=[]                       #####
                for k in keywords:               #####    
                    if k in sentence:            #####
                        gs_list.append(k)        #####                        

                for k2 in keywords2:
                    foo(sentence,k2)

My thinking was that this would ensure that this list forming process happens only once for every sentence and not 50 times like before. This should definitely increase the speed of the code. But this code actually took 89 seconds to go through the same 100 files.

I am unable to understand why this can take more time than the previous code. Any ideas?


Aucun commentaire:

Enregistrer un commentaire