Opening
A computer is just a machine that processes instructions given to it. Sometimes these instructions are very complicated, such as the operating system Windows or AI models such as ChatGPT. But at their core, computers just process instructions (and specifically, assembly instructions).
The task of assigning a computer specific instructions to execute is called programming. As you probably know, if we want the computer to do complicated things it is more convenient to split our code into functions.
For example, suppose we check if a number is prime a lot of times in our code. It will be convenient to write a function that returns true or false and just call it a lot of times.
So lets say we have the following snippet of code:
int main()
{
int x = 3;
int y = 5;
int res = is_prime(x);
x = x + y;
res = is_prime(x);
return 0;
}
When returning from the first call to \texttt{is\_prime} on the third line, how will we know to return to the fourth line? For this we need to save the address of the fourth line somewhere, it is called the return address.
Additionally, when executing code, the values we operate on are saved inside the registers (\texttt{eax, ebx, ecx,…}). When executing is_prime their values change. How will we restore them when returning back to main?
When calling a function we need to save values associated with the parent function. The values we need to save are the registers’ values when the function was called. The return address is just the value of the Instruction Pointer register (\texttt{eip}).
These values will be saved in a data structure called the stack.
The assembly stack
As the name suggests, the stack is a structure where we put values and the last object will put in will be the first one we get out (LIFO architecture).
The stack is just an area in memory that is maintained by two special registers, \texttt{esp} and \texttt{ebp}. Respectively, they are called the stack pointer and the base pointer.
The purpose of \texttt{esp} is to point at the end of the stack. In most architectures the stack grows downward. So when we push a four byte value on the stack (like an \texttt{int}) we would put it at the address \texttt{esp} points to and move \texttt{esp} forward (subtract 4 from it).
So the code:
push eax
Will be equivalent to:
mov [esp], eax
sub esp, 4
We we get values out of the stack (called popping values) we do the opposite, so the code:
pop eax
Is equivalent to:
add esp, 4
mov eax, [esp]
OK, so we can save data by pushing to the stack, doing some other stuff, and then popping it back, nice.
The stack and functions
When we call a function we want to save all the values of the registers on the stack and jump to it. When we return from the function we want to pop the values back.
[notice that we don’t need to jump back since we pop the return address to eip so we automatically go out of the function without jumping]
So when we do:
call is_prime
We want it to be equivalent to:
push eip
push eax
push ebx
push ecx
push edx
jmp is_prime
And when we do:
ret
We want it to be the same as:
pop edx
pop ecx
pop ebx
pop eax
pop eip // jumps automatically
This would work for simple functions. Some complications arise when we talk about passing arguments and local variables.
Passing arguments
Most functions require at least one input and produce at least one output. The inputs to a function are called its arguments. For example, \texttt{is\_prime(x)}, has the argument \texttt{x}.
One way to pass arguments is through the registers. This works in some cases and is very fast but has two flaws.
- A function may have more arguments than there are registers.
- The function must know which arguments is in which register.
The caller and the programmer of the function need to agree on an order of registers and arguments. This is prone to human error. This method of passing arguments is called fastcall and is not used a lot.
Another way is to put the arguments on the stack. This means we push their values when calling the function and we update esp back when returning.
So if we have a function \texttt{f(x,y,z)} when calling it we can do:
push eax // value of z
push ebx // value of y
push ecx // value of x
call f // push eip and jump to f
*The convention is usually that arguments to functions are pushed in reverse order.
This results in the following stack state when inside \texttt{f}.
When we return we do pop eip, but we also want to clean the stack from the arguments. There are two conventions for how to do this. Either the function \texttt{f} (the callee) can clean the stack or either the caller can clean it.
If \texttt{f} cleans the stack we need to do something like:
pop eip
add esp, 12 // 3 variables - 4 * 3 = 12 bytes
But there is problem because when we change \texttt{eip} we also change the next instruction, so we won’t really add 12 to \texttt{esp}. To solve this there is the \texttt{retn} instruction, we can just write:
retn 12
It will pop the return address and add 12 to \texttt{esp} at the same instruction.
If the caller cleans up the stack they will need to write something like:
call f
add esp, 12
The convention where the callee cleans the stack is called stdcall, the one where the caller cleans it is called cdecl. Both the programmer of the function and the caller need to agree which convention they use. Otherwise the arguments may be “popped twice” or not at all.
Local variables
In a lot of cases functions do some calculations that require saving values in variables. The variables that exist only in a single function are called its local variables. Because they are used only in the function where they are declared, it makes sense to put them on the stack and discard them when exiting the function.
Lets say a function \texttt{f} has 3 local variables:
void f(int a, int b)
{
int x;
int y;
int z;
}
When calling this function we give the variables a place on the stack, so we do:
push ebx // argument b
push eax // argument a
push eip // return address
jmp f
f: // start of the function f
sub esp, 12 // room for 3 local variables
When accessing the variables inside the \texttt{f} we can do it through \texttt{esp}. For example, if we do \texttt{x = 2}, it is the same as \texttt{mov [esp + 12], 2}.
When we return we just do \texttt{add esp, 12} before cleaning the stack like we did earlier.
This kind of works. The problem is, what happens if the value of \texttt{esp} changes when we are inside the function? How will we know at what offset certain local variables are located?
Due to this problem, we introduce the \texttt{ebp} (base pointer) register. As the name suggests the ebp register points at the base of the function frame in the stack.
This is done by putting two special commands at the start of each function:
push ebp
mov ebp, esp
In this way we save the old value of \texttt{ebp} (we will need it when exiting the function) . We also make \texttt{ebp} point at a place that has constant offset from both the arguments and local variables.
For this reason, we usually access local variables and arguments through \texttt{ebp} and not \texttt{esp}.
When returning for the function all we have to add is the restoring of \texttt{ebp}’s value, so we will write:
mov esp, ebp // go back to the base of the frame
pop ebp // restore ebp’s value
retn 12 // return and clean arguments
Returning the result
The purpose of a function is a lot of times to calculate something. So after the function calculates it we need to get it back. This is usually accomplished by simply putting the returned value in a predetermined register (usually \texttt{eax}).
Full example of calling a function in assembly
Lets say we have the following code:
int main()
{
int x = f(1, 2);
return 0;
}
int f(int a, int b)
{
int c = a + b;
return c;
}
Now that we know how the stack works and how functions calls are make, lets see how the computer actually executes this. In this example we will use the stdcall convention and return the value through \texttt{eax}.
f:
push ebp
mov ebp, esp // starting f’s frame
sub esp, 4 // make room for f’s local variable c
mov eax, [ebp + 8] // get the value of a
add eax, [ebp + 12] // add the value of b
mov [ebp], eax // c = a + b
mov eax, [ebp] // the return value of f
mov esp, ebp // clean f’s stack frame
pop ebp
retn 8 // return and clean two arguments
main:
push ebp
mov ebp, esp // starting main’s frame
mov eax, 1
mov ebx, 2
push ebx // push argument b
push eax // push argument a
call f
mov eax, 0 // return value of main
mov esp, ebp // cleaning main’s frame
pop ebp
ret // return from main
I’ve been browsing online greater than three hours these days, but I never discovered
any attention-grabbing article like yours. It is lovely price
sufficient for me. Personally, if all website owners and bloggers made
excellent content material as you probably
did, the net will probably be much more useful than ever before.
It’s perfect time to make a few plans for the long run and
it’s time to be happy. I have learn this post and if
I may just I wish to suggest you few interesting issues or tips.
Maybe you could write subsequent articles regarding this article.
I wish to read more issues about it!
Hi, what tips do you suggest?
thanks,
Nadav