Home > Uncategorized > Playing With Assembly

Playing With Assembly

We were talking at my work about machine instructions available on new Intel chips, specifically the cache miss counters. Why we were talking about cache miss counters is a longer tangent, but for the purposes of this tangent, we started toying with assembly and accidentally learned a few things.

First, we wrote a trivial main.c to drive our program:

#include 

// Promise C that we will link against externally defined functions (f and g)
// with these signatures.
extern int f(int i);
extern int g(int i);

// Simple main.
int main(int args, char** argv)
{
  int i = 1;

  printf("i=%d\n", i);
  printf("f=%d\n", f(i));
  printf("g=%d\n", g(i));

  return 0;
}

If you hadn’t guessed, we’re now going to define int f(int) and int g(int). Both of these functions take an integer, increment it, and return it. First, here is int g(int) as generated by an unoptimized gcc call we put in the file g.S:

.global g

g:
push   %rbp             # Push the current base pointer on to the stack.
mov    %rsp, %rbp       # Store the stack pointer value into the base pointer.
mov    %edi,-0x4(%rbp)  # Put argument 1 into the local variable.
mov    -0x4(%rbp),%eax  # Put the local variable into the output reg
add    $0x1,%eax        # Increment the value in eax by 1.
pop    %rbp             # Restore the base pointer of the prev frame.
retq

Looks a little verbose. Here is our hand-optimized version called int f(int) that we put in a file named f.S:

.global f

f:
  mov    %edi,%eax        # Put input value into output register.
  add    $0x1,%eax        # Increment the value in eax by 1.
  retq

They do (roughly) the same thing. Add 1 to the input and return the input.

So, what can we learn from this? Well, notice how we pluck the input from the %edi register and place it in the %eax register? Those calling conventions might differ if you are in another language, like, say, C++. This is why we use extern "C" { #include }. Remember that the compiler generates the machine code from the header, not the linked machine code.

It’s also instructive to notice that the unoptimized gcc call pushes %rbp (the base pointer) to the stack and then points %rbp at its old value. It then accesses a local variable 4 bytes in size above the current top of the stack (it’s -0×4 because execution stacks grow from the top-down).

If we had used a local variable and made another function call the stack pointer (%rsp) would have been decremented so that then ext call to push %rbp wouldn’t clobber our existing data. We would also see a leaveq which quickly pops all of our data off the stack by taking the old %rbp stored in the current %rbp and restoring the old value.

Nothing too earth-shattering, but revisiting the metal-level helps keep perspective and helps me keep one eye on what the stack machine is doing under the higher level language I’m usually using, such as C++.

Advertisement
Categories: Uncategorized Tags: , , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.