TroyGrosfield.com TroyGrosfield.com

Headline

Django, Celerybeat and Celery with MongoDB as the Broker

Author
by Troy Grosfield
Date
November 22nd, 2011
Category
Developer
Story

Enhance your user experience by preventing your users from having to wait long periods of time for certain actions to occurs.  There are times when you want to send an email to many people or do other processor intensive work that you won’t want your user to have to wait on.  In cases like these, it’s smart to setup tasks to run in the background that you can kickoff so your user can continue browsing your site and perform actions without long wait times.

Below I’ll show you how to setup Celery and Celerybeat using MongoDB as the broker. I’m running Django 1.3 with a MongoDB backend.  This already assumes you have Django running with MongoDB properly configured.

Celery

Celery is a distributed task queue that will assist with such background tasks.  Celery executes tasks synchronously or asynchronously with a set of worker nodes, or processes, that run in the background listening and waiting until they are called upon to perform tasks.

Install

Use pip to install Celery and django-celery to get started:

pip install celery
pip install django-celery

Celery, django-celery and it’s dependencies that get installed:

celery 2.4.1
anyjson 0.3.1 # Dependency for celery
kombu 1.4.3 # Dependency for celery
amqplib 1.0.2 # Dependency for celery
django-celery 2.4.1
django-picklefield 0.1.9 # Dependency for django-celery

Verify everything was installed correctly by going into the python shell and importing celery:

$ python
>>> import celery
>>>

If you don’t see any errors then Celery was successfully installed.

Settings.py

Once you have Celery installed, update your django settings.py file with the djcelery app:

INSTALLED_APPS += ('djcelery',)

and include the Celery configuration using MongoDB as the broker:

CELERY_RESULT_BACKEND = "mongodb"
CELERY_MONGODB_BACKEND_SETTINGS = {
    "host": "127.0.0.1",
    "port": 27017,
    "database": "celery",
    "taskmeta_collection": "my_taskmeta" # Collection name to use for task output
}

BROKER_BACKEND = "mongodb"
BROKER_HOST = "localhost"
BROKER_PORT = 27017
BROKER_USER = ""
BROKER_PASSWORD = ""
BROKER_VHOST = "celery"

# Find and register all celery tasks.  Your tasks need to be in a
# tasks.py file to be picked up.
CELERY_IMPORTS = ('my_tasks.tasks', )

You should now be able to start celery through your projects manage.py file:

$ python manage.py celeryd

Note: If you see Task Not Registered error messages, you likely need to add the tasks.py file to the CELERY_IMPORTS setting.

Writing Your First Task

Now let’s create a task and test that everything is working properly.  Create the following my_tasks app in your project:

- my_project
|- my_tasks
 |- __init__.py
 |- tasks.py
- __init__.py
- settings.py

Make sure my_tasks is in your INSTALLED_APPS and make sure you register your tasks.py file is in CELERY_IMPORTS in settings.py:

INSTALLED_APPS += ('my_tasks',)
CELERY_IMPORTS = ('my_tasks.tasks',)

Inside your tasks.py file, create your first task:

from celery.decorators import task

@task()
def add(x, y):
    return x + y

Celery registers tasks when it starts so we need to restart Celery so it can register the new task. Once you restart Celery, open a new terminal and test that the task works. Open the python shell from the project root and try to run the task we just created:

$ python manage.py shell
>>> from my_tasks import tasks
>>> result = tasks.add.delay(5,5)
>>> result.ready()
True
>>> result.successful()
True

If the output of result.ready() is True then your task setup was successful!

Celery init.d

Once you have your project configured, you’ll want to easily be able to start, stop and restart Celery processes in the background. Celery provides very useful default init.d script to assist you with this which can be found at:

https://github.com/ask/celery/blob/master/contrib/generic-init.d/celeryd

Copy that file as is and place it in /etc/init.d/celeryd

Celery Defaults

Next, you want to create a file that has your Celery settings. It will be located at /etc/default/celeryd. This is where the Celery init.d script will look for your default settings.

# Name of nodes to start, here we have a single node
#CELERYD_NODES="w1"
# Or use as many nodes as you like.  Here we have 4 worker nodes.
# Use 1 worker per server core.
CELERYD_NODES="w1 w2 w3 w4"

# Where to chdir at start.
CELERYD_CHDIR="/path/to/your/project"

# Log level.  This will be deprecated in CELERY 3.0. Can be one of
# DEBUG, INFO, WARNING, ERROR or CRITICAL.
CELERYD_LOG_LEVEL="INFO"

# Celery location for virtualevn
CELERYD="python $CELERYD_CHDIR/manage.py celeryd --loglevel=$CELERYD_LOG_LEVEL"

# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"

# Task hard time limit in seconds. The worker processing the task
# will be killed and replaced with a new one when this is exceeded.
# 86400 = 24 hours
CELERYD_TASK_TIME_LIMIT=86400

# Extra arguments to celeryd
CELERYD_OPTS="--concurrency=8"

# Name of the celery config module.
CELERY_CONFIG_MODULE="celeryconfig"

# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/var/log/celeryd_%n.log"
CELERYD_PID_FILE="/var/run/celeryd_%n.pid"

# Workers should run as an unprivileged user. Don't run celery as root!
CELERYD_USER="celery"
CELERYD_GROUP=""

# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="settings"

Once you have those two files in place you should be able to start Celery using the init.d script:

$ /etc/init.d/celeryd start
celeryd-multi v2.2.6
> Starting nodes...
	> w1.some-comp-name: OK

Celerybeat

Celerybeat is a background scheduler process that will allow you to run cron like, periodic, tasks.

Celerybeat init.d

Celery also provides a generic init.d script for Celerybeat. A very handy init.d script that makes starting, stopping and restarting Celerybeat very easy.

https://github.com/ask/celery/blob/master/contrib/generic-init.d/celerybeat

Copy and place that file as is in /etc/init.d/celerybeat

Celerybeat Settings

Celery and Celerybeat default settings can live in the same file if you like. Open your Celery settings file we created earlier:

$ vi /etc/default/celeryd

and append the following Celerybeat settings to the bottom of the file:

# Where to chdir at start.
CELERYBEAT_CHDIR="/path/to/your/project"

# Path to celerybeat
CELERYBEAT="$CELERYBEAT_CHDIR/manage.py celerybeat"

# Extra arguments to celerybeat.  This is a file that will get
# created for scheduled tasks.  It's generated automatically
# when Celerybeat starts.
CELERYBEAT_OPTS="--schedule=/var/run/celerybeat-schedule"

# Log level. Can be one of DEBUG, INFO, WARNING, ERROR or CRITICAL.
CELERYBEAT_LOG_LEVEL="INFO"

# Log file locations
CELERYBEAT_LOGFILE="/var/log/celerybeat.log"
CELERYBEAT_PIDFILE="/var/run/celerybeat.pid"

# Celerybeat should run as an unprivileged user. Don't run as root!
CELERYBEAT_USER="celery"
CELERYBEAT_GROUP=""

Don’t Run Celery as the Root User

Celery recommends not running Celery or Celerybeat as the root user. One way to check which user is running Celery is the issue the following command in a terminal:

$ ps aux | grep celery

This will show all running processes that have to do with Celery.  The first portion of the output will show the user running the process.  If you see something like:

root ... /usr/bin/python /path/to/your/project/manage.py celeryd --loglevel=INFO --concurrency=8 -n w1.some_comp_ip --logfile=/var/log/celeryd_w1.log --pidfile=/var/run/celeryd_w1.pid

Then you’ll see root as the user running the process. Create a new unprivileged user to run these processes:

$ adduser --system --no-create-home --disabled-login --disabled-password --group celery

That’s it! You should now have Celery worker(s) running in the background ready to run tasks from your application!

Tags
Comments
2 Comments »

2 Comments

Leave a reply

 
  1. Author
    Troy Grosfield
    Date
    February 24th, 2012 at 10:52 am
    Comment

    Got it. Thanks Nick!

  2. Author
    Nick Perkins
    Date
    November 30th, 2011 at 1:43 pm
    Comment

    Nice article, but I think there is a typo in the last sentence:
    did you mean “not” or “now”, when you said:
    “You should not have Celery worker(s) running in the background ready to run tasks from your application!”