Hacking Python Applications: Introduction
An introduction to hacking closed-source python applications via injecting code into the running process.
Lately I've been getting into reverse engineering. For some reason I just really love to make things do what they shouldn't. I started reversing games as an entry point. It's good fun, the feedback is instantaneous, and you learn a whole lot without feeling like you've been learning at all.
Before this I had some experience with hacking native games. I learned about the basics. Finding what I wanted to mess with, reading from and writing to addresses, patching bytes to alter program flow, calling functions, and DLL injection. I could make simple internal and external hacks for native games.
So I have a cycle I've been following where I'll pick a new game and practice reversing it and writing my cheats. The most recent game I chose was Rift Wizard. At first I wanted to try save editing because there is an option to resume a previous run. So I started a new run, and then exited the game. I navigated to the game's directory to search for the save data. This is what I was greeted by.
Yes, this is a pygame game! The first good quality one I've ever seen on Steam. So the save editing idea is out the window. Can I edit the source? Unfortunately, not easily. The game is bundled with pyinstaller, which if you don't know creates an executable that can be distributed and run by systems without the user installing python and the script's non-base libraries.
Unpacking is doable quite easily, however I ran into trouble decompiling the resulting pyc files. The version of python that Rift Wizard is written in is incompatible with uncompyle6 at the present moment. That is where my research into this option stopped. However, another idea popped into my head.
I know that python is an interpreted language. I had a rough understanding that the python interpreter takes bytecode and converts it into native code. I wondered if it was possible to pass my own bytecode directly into the interpreter. I have no idea how many levels of wrong this idea was, but I do know that a google search for this pointed me in the right direction.
I found a project on GitHub called pyrasite . It injects code into running python processes. Bingo! Now I just have to use it, right? Once again, unfortunately not. The project has been abandoned for five years. Both the Linux and Windows versions are broken. Other people have maintained the Linux versions, because it seems to be a wrapper around a few GDB commands. However, nobody had maintained the Windows version.
Even broken, the source proved extremely useful and surprisingly tiny. It allocated space in the target process' memory, and wrote two things. First was the code string to be run. Next was a function to load the library containing the Python C API functions, and call three of them to execute the code string. It then creates a remote thread which calls the function it wrote to the target. Now that I understood it, I just needed to patch it.
This approach proved to be quite frustrating. The code was quite strange. It had an array containing all of the python version names that could possibly be the name of the library (five years ago anyway), and it just attempted the injection with ALL of them. On top of that it manually mapped it's own code to the target process, something which I have no experience with. For the lift of me I could not get it to work properly, so I wrote an injector the way I knew how.
I started by creating a process scanner which searches all system process for modules which contain 'python', and a number following it. It would then output the base module, python version, and PID to the console. This eliminated the need for manually locating a process PID, but most importantly allowed me to parse the module name I needed instead of guessing.
Next I wrote a simple DLL injector which creates a remote thread which calls the WinAPI function LoadLibraryA() with the path of my DLL. When the DLL is loaded, it uses named pipes to communicate with the injector. It waits for a code path and version module name to be passed through, then it parses the code at the path and executes it within the target. Now I had it in the bag.
But what exactly can we do now? Anything.
If you made it this far, thanks for sticking around. It's the first time I've ever written for a blog. If you are still interested, in the following posts I'll be explaining the payloads I've created to ease the reversing process. Then I'll actually be hacking Rift Wizard. If you want to give this a shot yourself, I've open sourced all my work.
I am a software developer currently in college. My strongest language is python. I am interested in pretty much all things development, however I am particularly keen on reverse engineering lately.