From C, with inline assembly, to shellcode
Friday, August 11, 2023
Last updated
Friday, August 11, 2023
Last updated
One of the most important concepts in malware development is the concept of Position-Independent-Code (PIC). PIC, sometimes interchangeably referred to as shellcode, is a set of assembly instructions that can execute without being loaded by Windows as a complete executable. You’re probably most familiar with shellcode generated by msfvenom that looks something like the screenshot below.
However, because msfvenom payloads are so signatured, the ability to quickly generate these payloads is severely limited by antivirus detections on pretty much any environment.
There are a few ways to generate custom shellcode. Most commonly, programming your shellcode in raw assembly makes the development process technically difficult but provides very granular control over the behavior of your shellcode. This has been done for a very long time, so there are many good examples on how to get started online.
Alternatively, shellcode can be carefully crafted in C to provide a simpler interface for developers, as discussed in mattifestation’s original writeup linked at the bottom of this article. This method requires fairly extensive customization of compiler and linker options, and there’s reduced fidelity of control on the output shellcode.
In 2021, hasherezade developed a method of developing shellcode that combined the two above methods. Through some intermediate manipulations, the MSVC compiler could be used to generate shellcode from C in a process reminiscent of mattifestation’s method but with increased control. The downside of this method was that shellcode extraction occurred using some 3rd-party tool or script. Additionally, the shellcode sizes of this method are fairly large, almost 2kb in my testing.
In this writeup, we’ll be demonstrating a novel method of shellcode development that’s similar to Hasherezade’s method. Instead of using MSVC, we’ll use MinGW to compile some position-independent C code with optimizations and extract our shellcode at runtime. Our method differs from existing methods because our optimization options will be very simple, and the output shellcode will be comparable in size to shellcode generated by msfvenom.
The first thing we need to understand to get a working shellcode, is that the code we develop cannot reference any WinAPI functions, at least not at first. In a typical C/C++ implementation, you would see something like this:
Thankfully, there’s a pretty robust set of public research available on implementing custom versions of GetProcAddress and GetModuleHandle. But it’s important that when we implement these in our code, we tell the compiler to inline our code as much as possible. This option is important so that our shellcode will all be located in a single snippet, rather than being implemented across several function calls.
Your custom GetProcAddress and GetModuleHandle should look something like this:
There’s one more code snippet that’s worth talking about at this point. Inside of our custom GetModuleHandle, there’s a call to “mem_cmp” which is a custom implementation of the C runtime (CRT) function “memcmp”. This implementation uses that in lieu of “strcmp” or “memcmp” so that our code can retain position independence.
This is the implementation that I used:
Once again, note the “inline” attributes in the declaration.
Now that we’ve created the necessary code snippets, let's talk about what the primary code logic ought to look like.
Line 8 contains the typedef of WinExec, which we’ll be using to execute calc.exe.
Lines 10-12 contain the function prototypes for the function we wrote in the previous section.
Lines 16-17 have pvStart/EndAddress which we’ll be using later to extract our shellcode.
Line 19 is an inline assembly label “StartAddress” which we can later use to get the starting address of our shellcode. Lines 22-24 align the stack and allocate stack space. Like previous iterations of code C to Shellcode methods this is a necessary step to provide our shellcode stack space for our variables.
Lines 27-31 are our local variables that store the strings we’ll need to find WinExec and pop calc.
Line 34 creates the function pointer to WinExec.
Line 35 calls WinExec will our calc.exe argument.
Conventional approaches to shellcode development in C would stop somewhere around here. But by using inline assembly, we can take our shellcode development a step further and extract our shellcode at run time. Let us look at one way to do that.
Line 38 resets the stack pointer to its value prior to our shellcode execution on line 24.
Line 39 creates an inline assembly label “EndAddress” that marks the end of our shellcode block.
Lines 41-43 extracts the start address of our shellcode and stores it in the pvStartAddress variable.
Line 45-47 extracts the end address of our shellcode and stores it in the pvEndAddress variable.
Lines 52-61 print the contents of our program between pvStartAddress and pvEndAddress.
If we compile our program without any optimizations, we’ll see the following:
At first glance it looks like it worked, but there’s a couple of things that don’t add up. Firstly, this payload is huge. Our second hint that something is wrong is that our payload fails if we test it inside a standard dropper template.
If we run our implant.exe with our custom shellcode embedded inside a debugger, we see that there’s a call to memcpy (placed by the compiler) inside of what’s supposed to be our position-independent code!
We can fix this issue by compiling our code with the -O option. Now if we execute our program we’ll see the following:
We can test our code above by putting our shellcode in a generic dropper template.
Note that our shellcode is only 286 bytes in size, compared to the MSFVenom 276 byte payload that we saw at the beginning of this article.
In this writeup, we discussed a methodology to generate shellcode in a streamlined manner without over-complicated compiler options. The method we discussed relies on using MinGW, custom GetProcAddress and GetModuleHandle implementations, and a few optimizations to generate our shellcode. It’s important to note that our this method relies on using the “-O” or “-O1” options in MinGW. Other optimization settings implement other shortcuts that will break our shellcode’s position independence in favor of other optimization metrics in the output .exe.
This method provides us with an easy-to-use development workflow for custom shellcode generation. And because we’re using the MinGW compiler, we retain the ability to achieve granular control of our shellcode. With many benefits and only a few drawbacks, this method is my second preferred method of custom shellcode generation (after writing raw assembly ofc).