Exploring Python Using GDB
People tend to have a narrow view of the problems they can solve using GDB. Many think that GDB is just for debugging segfaults or that it's only useful with C or C++ programs. In reality, GDB is an impressively general and powerful tool. When you know how to use it, you can debug just about anything, including Python, Ruby, and other dynamic languages. It's not just for inspection either—GDB can also be used to modify a program's behavior while it's running.
When we ran our Capture The Flag contest, a lot of people asked us about introductions to that kind of low-level work. GDB can be a great way to get started. In order to demonstrate some of GDB's flexibility, and show some of the steps involved in practical GDB work, we've put together a brief example of debugging Python with GDB.
Imagine you're building a web app in Django. The standard cycle for building one of these apps is to edit some code, hit an error, fix it, restart the server, and refresh in the browser. It's a little tedious. Wouldn't it be cool if you could hit the error, fix the code while the request is still pending, and then have the request complete successfully?
As it happens, the Seaside framework supports exactly this. Using one of Stripe's example projects, let's take a look at how we could pull it off in Python using GDB:
Pretty cool, right? Though a little contrived, this example demonstrates many helpful techniques for making effective real-world use of GDB. I'll walk through what we did in a little more detail, and explain some of the GDB tricks as we go.
For the sake of brevity, I'll show the commands I type, but elide some of the output they generate. I'm working on Ubuntu 12.04 with GDB 7.4. The manipulation should still work on other platforms, but you probably won't get automatic pretty-printing of Python types. You can generate them by hand by running
p PyString_AsString(PyObject_Repr(obj)) in GDB.
Getting Set Up
First, let's start the monospace-django server with
--noreload so that Django's autoreloading doesn't get in the way of our GDB-based reloading. We'll also use the
python2.7-dbg interpreter, which will ensure that less of the program's state is optimized away.
As of version 7.0 of GDB, it's possible to automatically script GDB's behavior, and even register your own code to pretty-print C types. Python comes with its own hooks which can pretty-print Python types (such as
PyObject *) and understand the Python stack. These hooks are loaded automatically if you have the
python2.7-dbg package installed on Ubuntu.
Whatever you're debugging, you should look to see if there are relevant GDB scripts available—useful helpers have been created for many dynamic languages.
Catching the Error
The Python interpreter creates a
PyFrameObject every time it starts executing a Python stack frame. From that frame object, we can get the name of the function being executed. It's stored as a Python object, so we can convert it to a C string using
PyString_AsString, and then stop the interpreter only if it begins executing a function called
The obvious way to catch this would be by creating a GDB breakpoint. A lot of frames are allocated in the process of executing Python code, though. Rather than tediously continue through hundreds of false positives, we can set a conditional breakpoint that'll break on only the frame we care about:
Breakpoint conditions can be pretty complex, but it's worth noting that conditional breakpoints that fire often (like
PyEval_EvalFrameEx) can slow the program down significantly.
Generating the Initial Return Value
Okay, let's see if we can actually fix things during the next request. We resubmit the form. Once again, GDB halts when the app starts generating the internal server error response. While we investigate more, let's disable the breakpoint in order to keep things fast.
What we really want to do here is to let the app finish generating its original return value (the error response) and then to replace that with our own (the correct response). We find the stack frame where
get_response is being evaluated. Once we've jumped to that frame with the
frame command, we can use the
finish command to wait until the currently selected stack frame finishes executing and returns.
Patching the Code
Now that we've gotten the interpreter into the state we want, we can use Python's internals to modify the running state of the application. GDB allows you to make fairly complicated dynamic function invocations, and we'll use lots of that here.
We use the C equivalent of the Python
reload function to reimport the code. We have to also reload the
monospace.urls module so that it picks up the new code in
One handy trick, which we use to invoke git in the video and curl here, is that you can run shell commands from within GDB.
We've now patched and reloaded the code. Next, let's generate a new response by finding
request from the local variables in this stack frame, and fetch and call its
In the above snippet, we use GDB's
set command to assign values to variables.
Alright, we now have a new response. Remember that we stopped the program right where the original
get_response method returned. The C return value for the Python interpreter is the same as the Python return value. And so, to replace that return value on x86, we just have to store the new return value in a register—
$rax on 64-bit x86— and then allow the execution to continue.
GDB allows you to refer to refer to the values returned by every command you evaluate by number. In this case, we want
And, like magic, our web request finishes successfully.
GDB is a powerful precision tool. Even if you spend most of your time writing code in a much higher-level language, it can be extremely useful to have it available when you need to investigate subtle bugs or complex issues in running applications.