The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine. Some of the details map precisely to the machine, but some do not. This is because the compiler suite needs no assembler pass in the usual pipeline. Instead, the compiler operates on a kind of semi-abstract instruction set, and instruction selection occurs partly after code generation. The assembler works on the semi-abstract form, so when you see an instruction like MOV what the toolchain actually generates for that operation might not be a move instruction at all, perhaps a clear or load. Or it might correspond exactly to the machine instruction with that name. In general, machine-specific operations tend to appear as themselves, while more general concepts like memory move and subroutine call and return are more abstract. The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.The assembler program is a way to parse a description of that semi-abstract instruction set and turn it into instructions to be input to the linker.
//go:noinlinecompiler-directive here... Don't get bitten.)
0x0000: Offset of the current instruction, relative to the start of the function.
TEXT "".add: The
TEXTdirective declares the
"".addsymbol as part of the
.textsection (i.e. runnable code) and indicates that the instructions that follow are the body of the function. The empty string
""will be replaced by the name of the current package at link-time: i.e.,
main.addonce linked into our final binary.
SBis the virtual register that holds the "static-base" pointer, i.e. the address of the beginning of the address-space of our program.
"".add(SB)declares that our symbol is located at some constant offset (computed by the linker) from the start of our address-space. Put differently, it has an absolute, direct address: it's a global function symbol. Good ol'
objdumpwill confirm all of that for us:
All user-defined symbols are written as offsets to the pseudo-registers FP (arguments and locals) and SB (globals). The SB pseudo-register can be thought of as the origin of memory, so the symbol foo(SB) is the name foo as an address in memory.
NOSPLIT: Indicates to the compiler that it should not insert the stack-split preamble, which checks whether the current stack needs to be grown. In the case of our
addfunction, the compiler has set the flag by itself: it is smart enough to figure that, since
addhas no local variables and no stack-frame of its own, it simply cannot outgrow the current stack; thus it'd be a complete waste of CPU cycles to run these checks at each call site.
"NOSPLIT": Don't insert the preamble to check if the stack must be split. The frame for the routine, plus anything it calls, must fit in the spare space at the top of the stack segment. Used to protect routines such as the stack splitting code itself. We'll have a quick word about goroutines and stack-splits at the end this chapter.
$0denotes the size in bytes of the stack-frame that will be allocated; while
$16specifies the size of the arguments passed in by the caller.
In the general case, the frame size is followed by an argument size, separated by a minus sign. (It's not a subtraction, just idiosyncratic syntax.) The frame size $24-8 states that the function has a 24-byte frame and is called with 8 bytes of argument, which live on the caller's frame. If NOSPLIT is not specified for the TEXT, the argument size must be provided. For assembly functions with Go prototypes, go vet will check that the argument size is correct.
The FUNCDATA and PCDATA directives contain information for use by the garbage collector; they are introduced by the compiler.
The SP pseudo-register is a virtual stack pointer used to refer to frame-local variables and the arguments being prepared for function calls. It points to the top of the local stack frame, so references should use negative offsets in the range [−framesize, 0): x-8(SP), y-4(SP), and so on.
"".a+8(SP)respectively refer to the addresses 12 bytes and 8 bytes below the top of the stack (remember: it grows downwards!).
.bare arbitrary aliases given to the referred locations; although they have absolutely no semantic meaning whatsoever, they are mandatory when using relative addressing on virtual registers. The documentation about the virtual frame-pointer has some to say about this:
The FP pseudo-register is a virtual frame pointer used to refer to function arguments. The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register. Thus 0(FP) is the first argument to the function, 8(FP) is the second (on a 64-bit machine), and so on. However, when referring to a function argument this way, it is necessary to place a name at the beginning, as in first_arg+0(FP) and second_arg+8(FP). (The meaning of the offset —offset from the frame pointer— distinct from its use with SB, where it is an offset from the symbol.) The assembler enforces this convention, rejecting plain 0(FP) and 8(FP). The actual name is semantically irrelevant but should be used to document the argument's name.
ais not located at
0(SP), but rather at
8(SP); that's because the caller stores its return-address in
CALLpseudo-instruction. 2. Arguments are passed in reverse-order; i.e. the first argument is the closest to the top of the stack.
ADDLdoes the actual addition of the two Long-words (i.e. 4-byte values) stored in
CX, then stores the final result in
AX. That result is then moved over to
"".~r2+16(SP), where the caller had previously reserved some stack space and expects to find its return values. Once again,
"".~r2has no semantic meaning here.
trueboolean value. The mechanics at play are exactly the same as for our first return value; only the offset relative to
RETpseudo-instruction tells the Go assembler to insert whatever instructions are required by the calling convention of the target platform in order to properly return from a subroutine call. Most likely this will cause the code to pop off the return-address stored at
0(SP)then jump back to it.
The last instruction in a TEXT block must be some sort of jump, usually a RET (pseudo-)instruction. (If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in TEXTs.)
main.addhas finished executing:
mainfunction looks like:
main.mainonce linked) is a global function symbol in the
.textsection, whose address is some constant offset from the beginning of our address-space.
main, grows its stack-frame by 24 bytes (remember that the stack grows downwards, so
SUBQhere actually makes the stack-frame bigger) by decrementing the virtual stack-pointer. Of those 24 bytes:
24(SP)) are used to store the current value of the frame-pointer
BP(the real one!) to allow for stack-unwinding and facilitate debugging
16(SP)) are reserved for the second return value (
bool) plus 3 bytes of necessary alignment on
12(SP)) are reserved for the first return value (
8(SP)) are reserved for the value of argument
4(SP)) are reserved for the value of argument
LEAQcomputes the new address of the frame-pointer and stores it in
137438953482actually corresponds to the
324-byte values concatenated into one 8-byte value:
addfunction as an offset relative to the static-base pointer: i.e. this is a straightforward jump to a direct address.
CALLalso pushes the return-address (8-byte value) at the top of the stack; so every references to
SPmade from within our
addfunction end up being offsetted by 8 bytes! E.g.
"".ais not at
0(SP)anymore, but at
NOSPLITas a hint for the compiler not to insert these checks.
TLSis a virtual register maintained by the runtime that holds a pointer to the current
g, i.e. the data-structure that keeps track of all the state of a goroutine.
gfrom the source code of the runtime:
g.stackguard0, which is the threshold value maintained by the runtime that, when compared to the stack-pointer, indicates whether or not a goroutine is about to run out of space. The prologue thus checks if the current
SPvalue is less than or equal to the
stackguard0threshold (that is, it's bigger), then jumps to the epilogue if it happens to be the case.
NOPinstruction just before the
CALLexists so that the prologue doesn't jump directly onto a
CALLinstruction. On some platforms, doing so can lead to very dark places; it's a common pratice to set-up a noop instruction right before the actual call and land on this
NOPinstead. [UPDATE: We've discussed about this matter in issue #4: Clarify "nop before call" paragraph.]