Posted by & filed under Blog, Tech.

Today Bart and I spent 2 hours debugging in the python Google App Engine, trying to figure out why all our scheduled tasks start to fail in the development server.

It is no secret that the public Rogerthat cloud is running on Google’s App Engine infrastructure.

To make our messaging service reliable, we make a lot of use of the deferred library which relies on the push task queue mechanism. So we kick off quite a large number of tasks.

At some point during testing on the development server, all tasks executions start to fail, flooding the logs as follows:

WARNING  2012-03-27 12:40:36,447 taskqueue_stub.py:1936] Task task4 failed to execute. This task will retry in 0.016 seconds
WARNING  2012-03-27 12:40:36,455 taskqueue_stub.py:1936] Task task3 failed to execute. This task will retry in 0.032 seconds
WARNING  2012-03-27 12:40:36,468 taskqueue_stub.py:1936] Task task4 failed to execute. This task will retry in 0.032 seconds
WARNING  2012-03-27 12:40:36,491 taskqueue_stub.py:1936] Task task3 failed to execute. This task will retry in 0.064 seconds
WARNING  2012-03-27 12:40:36,502 taskqueue_stub.py:1936] Task task4 failed to execute. This task will retry in 0.064 seconds
WARNING  2012-03-27 12:40:36,558 taskqueue_stub.py:1936] Task task3 failed to execute. This task will retry in 0.128 seconds
WARNING  2012-03-27 12:40:36,570 taskqueue_stub.py:1936] Task task4 failed to execute. This task will retry in 0.128 seconds
WARNING  2012-03-27 12:40:36,688 taskqueue_stub.py:1936] Task task3 failed to execute. This task will retry in 0.256 seconds
WARNING  2012-03-27 12:40:36,701 taskqueue_stub.py:1936] Task task4 failed to execute. This task will retry in 0.256 seconds

As you can see, no stack traces.

First we tried searching the internet for a solution, but apart from many others looking for help, we did not find the fix we needed. So there was no other option, as to debug the Google App Engine dev_server stack ourselves.

After a while we found that for some reason the development server stopped launching the task urls on the correct location. For some reason it resolved its own hostname and tried to access the RequestHandler with the hostname instead of on the ip address it is serving our app. Which did not work because the hostname resolves locally to 127.0.0.1 instead of on the ip address on wich we have the development serving. On top of that it totally lost the configured port as well on which we configured the dev server.

We fixed it by adding an extra line at the following location:

root@dev:/root/testing# git diff /root/testing/google_appengine/google/appengine/api/taskqueue/taskqueue_stub_original.py /root/testing/google_appengine/google/appengine/api/taskqueue/taskqueue_stub.py
diff --git a/root/testing/google_appengine/google/appengine/api/taskqueue/taskqueue_stub_original.py b/root/testing/google_appengine/google/appengine/api/taskqueue/taskqueue_stub.py
index df3b6a7..305167a 100644
--- a/root/testing/google_appengine/google/appengine/api/taskqueue/taskqueue_stub_original.py
+++ b/root/testing/google_appengine/google/appengine/api/taskqueue/taskqueue_stub.py
@@ -1826,6 +1826,7 @@ class _TaskExecutor(object):
 
 
       connection_host, = header_dict.get('host', [self._default_host])
+      connection_host = self._default_host
       if connection_host is None:
         logging.error('Could not determine where to send the task "%s" '
                       '(Url: "%s") in queue "%s". Treating as an error.',

Leave a Reply