Exploring Python Using GDB
Evan Broder on June 13, 2012 in Engineering
People tend to have a narrow view of the problems they can solve using GDB. Many think that GDB is just for debugging segfaults or that it's only useful with C or C++ programs. In reality, GDB is an impressively general and powerful tool. When you know how to use it, you can debug just about anything, including Python, Ruby, and other dynamic languages. It's not just for inspection either—GDB can also be used to modify a program's behavior while it's running.
When we ran our Capture The Flag contest, a lot of people asked us about introductions to that kind of low-level work. GDB can be a great way to get started. In order to demonstrate some of GDB's flexibility, and show some of the steps involved in practical GDB work, we've put together a brief example of debugging Python with GDB.
Imagine you're building a web app in Django. The standard cycle for building one of these apps is to edit some code, hit an error, fix it, restart the server, and refresh in the browser. It's a little tedious. Wouldn't it be cool if you could hit the error, fix the code while the request is still pending, and then have the request complete successfully?
As it happens, the Seaside framework supports exactly this. Using one of Stripe's example projects, let's take a look at how we could pull it off in Python using GDB:
GDB Demo Screencast
Pretty cool, right? Though a little contrived, this example demonstrates many helpful techniques for making effective real-world use of GDB. I'll walk through what we did in a little more detail, and explain some of the GDB tricks as we go.
For the sake of brevity, I'll show the commands I type, but elide
some of the output they generate. I'm working on Ubuntu 12.04 with GDB
7.4. The manipulation should still work on other platforms, but you
probably won't get automatic pretty-printing of Python types. You can
generate them by hand by running p
PyString_AsString(PyObject_Repr(obj))
in GDB.
Getting Set Up
First, let's start the monospace-django server with
--noreload
so that Django's autoreloading doesn't get in
the way of our GDB-based reloading. We'll also use the
python2.7-dbg
interpreter, which will ensure that less of
the program's state is optimized away.
$ git clone http://github.com/stripe/monospace-django $ cd monospace-django/ $ virtualenv --no-site-packages env $ cp /usr/bin/python2.7-dbg env/bin/python $ source env/bin/activate (env)$ pip install -r requirements.txt (env)$ python monospace/manage.py syncdb (env)$ python monospace/manage.py runserver --noreload $ sudo gdb -p $(pgrep -f monospace/manage.py) GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04 Copyright (C) 2012 Free Software Foundation, Inc. [...] Attaching to process 946 Reading symbols from /home/evan/monospace-django/env/bin/python...done. (gdb) symbol-file /usr/bin/python2.7-dbg Load new symbol table from "/usr/bin/python2.7-dbg"? (y or n) y Reading symbols from /usr/bin/python2.7-dbg...done.
As of version 7.0 of GDB, it's possible to automatically
script GDB's behavior, and even register your own code to
pretty-print C types. Python comes with its own hooks which can
pretty-print Python types (such as PyObject *
) and
understand the Python stack. These hooks are loaded automatically if
you have the python2.7-dbg
package installed on
Ubuntu.
Whatever you're debugging, you should look to see if there are relevant GDB scripts available—useful helpers have been created for many dynamic languages.
Catching the Error
The Python interpreter creates a PyFrameObject
every
time it starts executing a Python stack frame. From that frame object,
we can get the name of the function being executed. It's stored as a
Python object, so we can convert it to a C string using
PyString_AsString
, and then stop the interpreter only if
it begins executing a function called
handle_uncaught_exception
.
The obvious way to catch this would be by creating a GDB breakpoint. A lot of frames are allocated in the process of executing Python code, though. Rather than tediously continue through hundreds of false positives, we can set a conditional breakpoint that'll break on only the frame we care about:
(gdb) b PyEval_EvalFrameEx if strcmp(PyString_AsString(f->f_code->co_name), "handle_uncaught_exception") == 0 Breakpoint 1 at 0x519d64: file ../Python/ceval.c, line 688. (gdb) c Continuing.
Breakpoint conditions can be pretty complex, but it's worth noting
that conditional breakpoints that fire often (like
PyEval_EvalFrameEx
) can slow the program down
significantly.
Generating the Initial Return Value
Okay, let's see if we can actually fix things during the next request. We resubmit the form. Once again, GDB halts when the app starts generating the internal server error response. While we investigate more, let's disable the breakpoint in order to keep things fast.
What we really want to do here is to let the app finish generating
its original return value (the error response) and then to replace
that with our own (the correct response). We find the stack frame
where get_response
is being evaluated. Once we've jumped
to that frame with the up
or frame
command, we can use the finish
command to wait until the currently selected stack frame finishes
executing and returns.
Breakpoint 1, PyEval_EvalFrameEx (f= Frame 0x3534110, for file [...]/django/core/handlers/base.py, line 186, in handle_uncaught_exception [...], throwflag=0) at ../Python/ceval.c:688 688 ../Python/ceval.c: No such file or directory. (gdb) disable 1 (gdb) frame 3 #3 0x0000000000521276 in PyEval_EvalFrameEx (f= Frame 0x31ac000, for file [...]/django/core/handlers/base.py, line 169, in get_response [...], throwflag=0) at ../Python/ceval.c:2666 2666 in ../Python/ceval.c (gdb) finish Run till exit from #3 0x0000000000521276 in PyEval_EvalFrameEx (f= Frame 0x31ac000, for file [...]/django/core/handlers/base.py, line 169, in get_response [...], throwflag=0) at ../Python/ceval.c:2666 0x0000000000526871 in fast_function (func=<function at remote 0x26e96f0>, pp_stack=0x7fffb296e4b0, n=2, na=2, nk=0) at ../Python/ceval.c:4107 4107 in ../Python/ceval.c Value returned is $1 = <HttpResponseServerError[...] at remote 0x3474680>
Patching the Code
Now that we've gotten the interpreter into the state we want, we can use Python's internals to modify the running state of the application. GDB allows you to make fairly complicated dynamic function invocations, and we'll use lots of that here.
We use the C equivalent of the Python reload
function to reimport the code. We have to also reload the
monospace.urls
module so that it picks up the new code in
monospace.views
.
One handy trick, which we use to invoke git in the video and curl here, is that you can run shell commands from within GDB.
(gdb) shell curl -s -L https://gist.github.com/raw/2897961/ | patch -p1 patching file monospace/views.py (gdb) p PyImport_ReloadModule(PyImport_AddModule("monospace.views")) $2 = <module at remote 0x31d4b58> (gdb) p PyImport_ReloadModule(PyImport_AddModule("monospace.urls")) $3 = <module at remote 0x31d45a8>
We've now patched and reloaded the code. Next, let's generate a
new response by finding self
and request
from the local variables in this stack frame, and fetch and call its
get_response
method.
(gdb) p $self = PyDict_GetItemString(f->f_locals, "self") $4 = <WSGIHandler([...]) at remote 0x311c610> (gdb) set $request = PyDict_GetItemString(f->f_locals, "request") (gdb) set $get_response = PyObject_GetAttrString($self, "get_response") (gdb) set $args = Py_BuildValue("(O)", $request) (gdb) p PyObject_Call($get_response, $args, 0) $5 = <HttpResponse([...]) at remote 0x31b9fb0>
In the above snippet, we use GDB's set
command to
assign values to variables.
Alright, we now have a new response. Remember that we stopped the
program right where the original get_response
method
returned. The C return value for the Python interpreter is the same as
the Python return value. And so, to replace that return value on x86,
we just have to store the new return value in a
register—$rax
on 64-bit x86— and then allow
the execution to continue.
GDB allows you to refer to refer to the values returned by every
command you evaluate by number. In this case, we want
$5
:
(gdb) set $rax = $5 (gdb) c Continuing.
And, like magic, our web request finishes successfully.
GDB is a powerful precision tool. Even if you spend most of your time writing code in a much higher-level language, it can be extremely useful to have it available when you need to investigate subtle bugs or complex issues in running applications.