I have multiple log files that contain 10000+ lines of info and are Gzipped. I need a way to quickly parse each log file for relevant information and then display stats based on the information contained in all the log files. I currently use gzip.open() to recursively open each .gz file and then run the contents through a primitive parser.
def parse(logfile):
for line in logfile:
if "REPORT" in line:
info = line.split()
username = info[2]
area = info[4]
# Put info into dicts/lists etc.
elif "ERROR" in line:
info = line.split()
...
def main(args):
argdir = args[1]
for currdir, subdirs, files in os.walk(argdir):
for filename in files:
with gzip.open(os.path.join(currdir, filename), "rt") as log:
parse(log)
# Create a report at the end: createreport()
Is there any way to optimize this process for each file? It currently takes ~28 seconds per file on my computer to go through each .gz and every little optimization counts. I've tried using pypy and for some reason it takes 2 times longer to process a file.
Aucun commentaire:
Enregistrer un commentaire