Intro to x86 Assembly for Windows: concepts and tools

Having MASM32 installed (hopefully without problems), we can proceed to our first program. This first one I choose to borrow from Dr. Jeff Huang, assistant professor at Brown University, the file is hosted at the University of Illinois where he completed his Bachelor and Masters. This way at least you have a second source if I don't explain something well. The original PDF, with which I myself started x86 assembly after some Atmel we did in the second semester, can be found right here. Let's just drop the code below and see what we can make of it:

.386
.model flat, stdcall
option casemap :none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
.data
HelloWorld db "Hello World!", 0
.code
start:
invoke StdOut, addr HelloWorld
invoke ExitProcess, 0
end start

Now we can start. I believe the best idea is to manually write the code line by line into MASM32's editor as you read the tutorial for a better understanding, at least that's how I function best.

Starting straight off with the first line, the .386 indicates that the assembly should be compatible with the .386 Intel processor, this is a directive to the assembler mostly, as what commands we use are what determines the compatibility (see the introductory assembly "tutorial" where I briefly mention key processors and assembly instruction additions).

Next we have the .model directive, followed with flat and stdcall. The first refers to the way we address memory, the flat model is simple and straightforward: access memory linearly. The stdcall part refers to how we invoke functions, and this depends largely on language used, and the way the compiler takes the arguments for the function. We will see a bit more on this when we get into the details of calling functions, which will come soon enough. You can explore more options for this line drawing upon the MSDN entry for the .model directive, however we will not be touching this, at least not for a while. Also note that flat is the model used by windows in general, and thank goodness for that because we use a unified pointer format rather than far and short that were used in ye old times.

Next we have option casemap :none, as casemap tells us, this may have something to do with letter case, and indeed by setting this to none, we tell MASM that we want this to be a case-sensitive code, no labels or names or anything should be mapped to uppercase, lowercase or anything. If you work in a team and want to maintain some naming conventions you may want another mapping scheme so you can each write names the way you want, be it "MyVar", "myVar" or "myvar", and you'll refer to the same variable.

Now we have a few include directives, these are similar to C's #include directive, you include the content of .inc files into your code which contain code, naturally. We include windows.inc, kernel32.inc and masm32.inc. masm32.inc contains StdOut among others. The other two refer to the host, kernel32.inc will provide us with ExitProcess, while windows.inc has Win32Api data and calls. You should feel honored to have the priviledge of printf-like StdOut, in the university we used Atmel assembly on ATiny95 boards and used GDB (Grub DeBugger) to look at the raw memory to verify results, no need to say printing to a terminal is more convenient than line-by-line debugging through a terminal.

Next come the includelib directives, targetting the library files themselves. They complement the include directives though not directly, and they inform the linker that it should link our output to the appropriate dll's. Note how, since .inc files contain code, they don't necessarily need a coresponding lib file.

The next part is the .data segment, here you initialize your data, so any strings you write in C as "whatever the stream is" (including the quotes) should be declared here. Other directives we'll see in the future are .data? for uninitialized variables, and .const for constants. Accordingly we define HelloWorld to include the iconic phrase.

start: is rather self-explanatory. It's sort of like main() in C. The program just starts here.

invoke StdOut, addr HelloWorld is your first function call! This call is quite straightforward, we'll see some more details in the short future. essentially you call the invoke macro which handles parameter passing, we'll see how it's done by hand too later on. Note that we don't use parentheses but still separate the parameters with comas.

invoke ExitProcess, 0 similarly calls a function, which more or less stops the program and kills the process. This is necessary for a few reasons, most of which are generally anyway handled by the OS, but being tidy helps us remain safe and professional. Some reasons could be not executing junk that resides in memory after the last command, not executing other code that may reside there, not "hanging" and leaving a process open thus maintaining a slot in the process table (yes you do have around 65.535 processes in total but if you're running the program, say, every second, in an automated fashion, it may become an issue), that would make it a zombie process. This roughly corresponds to return 0; or return exit_success; in C.

end start signifies the end of start as a routine, here due to ExitProcess it's not really used but it is used when calling a function.

This concludes this tutorial, there isn't much else to do on your own other than trying to add another string and print it.