I have a celery task that I am loading with some data (~4GB) when the worker is starting through bootstep and the extra argument --init. Worker is started with concurrency 1.
celery worker -A config --config config.celery_settings.development -Q nlp --concurrency=1 --init nlp
Under MacOSX, everything seems to work fine: 2 processes one with little RAM use (the consumer), one with some RAM (the worker) .
> ps aux | grep -i celery
xxx 91879 0.0 0.1 7457408 13284 s003 S+ 4:48PM 0:00.02 /Users/xxx/.envs/paper/bin/python3.4 /Users/xxx/.envs/paper/bin/celery worker -A config --config config.celery_settings.development -Q nlp --concurrency=1 --init nlp -l DEBUG
xxx 91229 0.0 29.9 7459456 5012028 s003 S+ 4:48PM 0:18.79 /Users/xxx/.envs/paper/bin/python3.4 /Users/xxx/.envs/paper/bin/celery worker -A config --config config.celery_settings.development -Q nlp --concurrency=1 --init nlp -l DEBUG
xxx 99368 0.0 0.0 2446104 820 s002 S+ 9:06AM 0:00.00 grep -i celery
But when I am firing the same worker on my aws prod environment (Ubuntu 14.04) the 2 processes are using (roughly) the same RAM.
> ps aux | grep -i celery
ubuntu 19361 0.0 34.7 6273624 5714528 pts/0 Sl+ 17:04 0:59 /home/ubuntu/.virtualenvs/production/bin/python3 /home/ubuntu/.virtualenvs/production/bin/celery worker -A config --config config.celery_settings.development -Q nlp --concurrency=1 --init nlp
ubuntu 19485 0.0 34.6 6273624 5702252 pts/0 Sl+ 17:05 0:00 /home/ubuntu/.virtualenvs/production/bin/python3 /home/ubuntu/.virtualenvs/production/bin/celery worker -A config --config config.celery_settings.development -Q nlp --concurrency=1 --init nlp
ubuntu 19490 0.0 0.0 10460 940 pts/1 S+ 17:05 0:00 grep --color=auto -i celery
Any clues on what could cause this memory mismatch between the 2 OS ?
Below is the code sample I am using (and that I think is relevant): (Python 3.4, Celery 3.1.18).
I have the bootstep in place in config/celery.py:
celery_app.user_options['worker'].add(
Option('--init', dest='init', default=None, help='Load data at task instantiation')
)
class InitArgs(bootsteps.Step):
"""BootStep to warm up task dispatchers of type Model, PaperEngine or
ThreadEngine with data"""
def __init__(self, worker, init, **options):
for k, task in worker.app.tasks.items():
if task.__name__.startswith('{0}_dispatcher'.format(init)):
task.load()
celery_app.steps['worker'].add(InitArgs)
The task requiring data load is defined in one of my django app in tasks.py:
class EmbedPaperTask(Task):
abstract = True
model_name = None
_model = None
def __init__(self, *args, **kwargs):
self.model_name = kwargs.get('model_name')
def load(self):
self._model = Model.objects.load(name=self.model_name)
return self._model
@property
def model(self):
if self._model is None:
return self.load()
return self._model
@app.task(base=EmbedPaperTask, bind=True, name=task_name, model_name=model_name)
def nlp_dispatcher(self, *args, **kwargs):
return self.model.tasks(*args, **kwargs)
NB: Hoping this is clear enough, I am new to this kind of thing.
Aucun commentaire:
Enregistrer un commentaire