Go Internals
Search…
Chapter II: Interfaces
1
$ go version
2
go version go1.10 linux/amd64
Copied!

Chapter II: Interfaces

This chapter covers the inner workings of Go's interfaces.
Specifically, we'll look at:
  • How functions & methods are called at run time.
  • How interfaces are built and what they're made of.
  • How, when and at what cost does dynamic dispatch work.
  • How the empty interface & other special cases differ from their peers.
  • How interface composition works.
  • How and at what cost do type assertions work.
As we dig deeper and deeper, we'll also poke at miscellaneous low-level concerns, such as some implementation details of modern CPUs as well as various optimizations techniques used by the Go compiler.
Table of Contents

Function and method calls

As pointed out by Russ Cox in his design document about function calls (listed at the end of this chapter), Go has..:
..4 different kinds of functions..:
  • top-level func
  • method with value receiver
  • method with pointer receiver
  • func literal
..and 5 different kinds of calls:
  • direct call of top-level func (func TopLevel(x int) {})
  • direct call of method with value receiver (func (Value) M(int) {})
  • direct call of method with pointer receiver (func (*Pointer) M(int) {})
  • indirect call of method on interface (type Interface interface { M(int) })
  • indirect call of func value (var literal = func(x int) {})
Mixed together, these make up for 10 possible combinations of function and call types:
  • direct call of top-level func /
  • direct call of method with value receiver /
  • direct call of method with pointer receiver /
  • indirect call of method on interface / containing value with value method
  • indirect call of method on interface / containing pointer with value method
  • indirect call of method on interface / containing pointer with pointer method
  • indirect call of func value / set to top-level func
  • indirect call of func value / set to value method
  • indirect call of func value / set to pointer method
  • indirect call of func value / set to func literal
(A slash separates what is known at compile time from what is only found out at run time.)
We'll first take a few minutes to review the three kinds of direct calls, then we'll shift our focus towards interfaces and indirect method calls for the rest of this chapter. We won't cover function literals in this chapter, as doing so would first require us to become familiar with the mechanics of closures.. which we'll inevitably do, in due time.

Overview of direct calls

Consider the following code (direct_calls.go):
1
//go:noinline
2
func Add(a, b int32) int32 { return a + b }
3
4
type Adder struct{ id int32 }
5
//go:noinline
6
func (adder *Adder) AddPtr(a, b int32) int32 { return a + b }
7
//go:noinline
8
func (adder Adder) AddVal(a, b int32) int32 { return a + b }
9
10
func main() {
11
Add(10, 32) // direct call of top-level function
12
13
adder := Adder{id: 6754}
14
adder.AddPtr(10, 32) // direct call of method with pointer receiver
15
adder.AddVal(10, 32) // direct call of method with value receiver
16
17
(&adder).AddVal(10, 32) // implicit dereferencing
18
}
Copied!
Let's have a quick look at the code generated for each of those 4 calls.
Direct call of a top-level function
Looking at the assembly output for Add(10, 32):
1
0x0000 TEXT "".main(SB), $40-0
2
;; ...omitted everything but the actual function call...
3
0x0021 MOVQ $137438953482, AX
4
0x002b MOVQ AX, (SP)
5
0x002f CALL "".Add(SB)
6
;; ...omitted everything but the actual function call...
Copied!
We see that, as we already knew from chapter I, this translates into a direct jump to a global function symbol in the .text section, with the arguments and return values stored on the caller's stack-frame. It's as straightforward as it gets.
Russ Cox wraps it up as such in his document:
Direct call of top-level func: A direct call of a top-level func passes all arguments on the stack, expecting results to occupy the successive stack positions.
Direct call of a method with pointer receiver
First things first, the receiver is initialized via adder := Adder{id: 6754}:
1
0x0034 MOVL $6754, "".adder+28(SP)
Copied!
(The extra-space on our stack-frame was pre-allocated as part of the frame-pointer preamble, which we haven't shown here for conciseness.)
Then comes the actual method call to adder.AddPtr(10, 32):
1
0x0057 LEAQ "".adder+28(SP), AX ;; move &adder to..
2
0x005c MOVQ AX, (SP) ;; ..the top of the stack (argument #1)
3
0x0060 MOVQ $137438953482, AX ;; move (32,10) to..
4
0x006a MOVQ AX, 8(SP) ;; ..the top of the stack (arguments #3 & #2)
5
0x006f CALL "".(*Adder).AddPtr(SB)
Copied!
Looking at the assembly output, we can clearly see that a call to a method (whether it has a value or pointer receiver) is almost identical to a function call, the only difference being that the receiver is passed as first argument. In this case, we do so by loading the effective address (LEAQ) of "".adder+28(SP) at the top of the frame, so that argument #1 becomes &adder (if you're a bit confused regarding the semantics of LEA vs. MOV, you may want to have a look at the links at the end of this chapter for some pointers).
Note how the compiler encodes the type of the receiver and whether it's a value or pointer directly into the name of the symbol: "".(*Adder).AddPtr.
Direct call of method: In order to use the same generated code for both an indirect call of a func value and for a direct call, the code generated for a method (both value and pointer receivers) is chosen to have the same calling convention as a top-level function with the receiver as a leading argument.
Direct call of a method with value receiver
As we'd expect, using a value receiver yields very similar code as above. Consider adder.AddVal(10, 32):
1
0x003c MOVQ $42949679714, AX ;; move (10,6754) to..
2
0x0046 MOVQ AX, (SP) ;; ..the top of the stack (arguments #2 & #1)
3
0x004a MOVL $32, 8(SP) ;; move 32 to the top of the stack (argument #3)
4
0x0052 CALL "".Adder.AddVal(SB)
Copied!
Looks like something a bit trickier is going on here, though: the generated assembly isn't even referencing "".adder+28(SP) anywhere, even though that is where our receiver currently resides. So what's really going on here? Well, since the receiver is a value, and since the compiler is able to statically infer that value, it doesn't bother with copying the existing value from its current location (28(SP)): instead, it simply creates a new, identical Adder value directly on the stack, and merges this operation with the creation of the second argument to save one more instruction in the process.
Once again, notice how the symbol name of the method explicitly denotes that it expects a value receiver.

Implicit dereferencing

There's one final call that we haven't looked at yet: (&adder).AddVal(10, 32). In that case, we're using a pointer variable to call a method that instead expects a value receiver. Somehow, Go automagically dereferences our pointer and manages to make the call. How so?
How the compiler handles this kind of situation depends on whether or not the receiver being pointed to has escaped to the heap or not.
Case A: The receiver is on the stack
If the receiver is still on the stack and its size is sufficiently small that it can be copied in a few instructions, as is the case here, the compiler simply copies its value over to the top of the stack then does a straightforward method call to "".Adder.AddVal (i.e. the one with a value receiver).
(&adder).AddVal(10, 32) thus looks like this in this situation:
1
0x0074 MOVL "".adder+28(SP), AX ;; move (i.e. copy) adder (note the MOV instead of a LEA) to..
2
0x0078 MOVL AX, (SP) ;; ..the top of the stack (argument #1)
3
0x007b MOVQ $137438953482, AX ;; move (32,10) to..
4
0x0085 MOVQ AX, 4(SP) ;; ..the top of the stack (arguments #3 & #2)
5
0x008a CALL "".Adder.AddVal(SB)
Copied!
Boring (although efficient). Let's move on to case B.
Case B: The receiver is on the heap
If the receiver has escaped to the heap then the compiler has to take a cleverer route: it generates a new method (with a pointer receiver, this time) that wraps "".Adder.AddVal, and replaces the original call to "".Adder.AddVal (the wrappee) with a call to "".(*Adder).AddVal (the wrapper). The wrapper's sole mission, then, is to make sure that the receiver gets properly dereferenced before being passed to the wrappee, and that any arguments and return values involved are properly copied back and forth between the caller and the wrappee.
(NOTE: In assembly outputs, these wrapper methods are marked as <autogenerated>.)
Here's an annotated listing of the generated wrapper that should hopefully clear things up a bit:
1
0x0000 TEXT "".(*Adder).AddVal(SB), DUPOK|WRAPPER, $32-24
2
;; ...omitted preambles...
3
4
0x0026 MOVQ ""..this+40(SP), AX ;; check whether the receiver..
5
0x002b TESTQ AX, AX ;; ..is nil
6
0x002e JEQ 92 ;; if it is, jump to 0x005c (panic)
7
8
0x0030 MOVL (AX), AX ;; dereference pointer receiver..
9
0x0032 MOVL AX, (SP) ;; ..and move (i.e. copy) the resulting value to argument #1
10
11
;; forward (copy) arguments #2 & #3 then call the wrappee
12
0x0035 MOVL "".a+48(SP), AX
13
0x0039 MOVL AX, 4(SP)
14
0x003d MOVL "".b+52(SP), AX
15
0x0041 MOVL AX, 8(SP)
16
0x0045 CALL "".Adder.AddVal(SB) ;; call the wrapped method
17
18
;; copy return value from wrapped method then return
19
0x004a MOVL 16(SP), AX
20
0x004e MOVL AX, "".~r2+56(SP)
21
;; ...omitted frame-pointer stuff...
22
0x005b RET
23
24
;; throw a panic with a detailed error
25
0x005c CALL runtime.panicwrap(SB)
26
27
;; ...omitted epilogues...
Copied!
Obviously, this kind of wrapper can induce quite a bit of overhead considering all the copying that needs to be done in order to pass the arguments back and forth; especially if the wrappee is just a few instructions. Fortunately, in practice, the compiler would have inlined the wrappee directly into the wrapper to amortize these costs (when feasible, at least).
Note the WRAPPER directive in the definition of the symbol, which indicates that this method shouldn't appear in backtraces (so as not to confuse the end-user), nor should it be able to recover from panics that might be thrown by the wrappee.
WRAPPER: This is a wrapper function and should not count as disabling recover.
The runtime.panicwrap function, which throws a panic if the wrapper's receiver is nil, is pretty self-explanatory; here's its complete listing for reference (src/runtime/error.go):
1
// panicwrap generates a panic for a call to a wrapped value method
2
// with a nil pointer receiver.
3
//
4
// It is called from the generated wrapper code.
5
func panicwrap() {
6
pc := getcallerpc()
7
name := funcname(findfunc(pc))
8
// name is something like "main.(*T).F".
9
// We want to extract pkg ("main"), typ ("T"), and meth ("F").
10
// Do it by finding the parens.
11
i := stringsIndexByte(name, '(')
12
if i < 0 {
13
throw("panicwrap: no ( in " + name)
14
}
15
pkg := name[:i-1]
16
if i+2 >= len(name) || name[i-1:i+2] != ".(*" {
17
throw("panicwrap: unexpected string after package name: " + name)
18
}
19
name = name[i+2:]
20
i = stringsIndexByte(name, ')')
21
if i < 0 {
22
throw("panicwrap: no ) in " + name)
23
}
24
if i+2 >= len(name) || name[i:i+2] != ")." {
25
throw("panicwrap: unexpected string after type name: " + name)
26
}
27
typ := name[:i]
28
meth := name[i+2:]
29
panic(plainError("value method " + pkg + "." + typ + "." + meth + " called using nil *" + typ + " pointer"))
30
}
Copied!
That's all for function and method calls, we'll now focus on the main course: interfaces.

Anatomy of an interface

Overview of the datastructures

Before we can understand how they work, we first need to build a mental model of the datastructures that make up interfaces and how they're laid out in memory. To that end, we'll have a quick peek into the runtime package to see what an interface actually looks like from the standpoint of the Go implementation.
The iface structure
iface is the root type that represents an interface within the runtime (src/runtime/runtime2.go). Its definition goes like this:
1
type iface struct { // 16 bytes on a 64bit arch
2
tab *itab
3
data unsafe.Pointer
4
}
Copied!
An interface is thus a very simple structure that maintains 2 pointers:
  • tab holds the address of an itab object, which embeds the datastructures that describe both the type of the interface as well as the type of the data it points to.
  • data is a raw (i.e. unsafe) pointer to the value held by the interface.
While extremely simple, this definition already gives us some valuable information: since interfaces can only hold pointers, any concrete value that we wrap into an interface will have to have its address taken. More often than not, this will result in a heap allocation as the compiler takes the conservative route and forces the receiver to escape. This holds true even for scalar types!
We can prove that with a few lines of code (escape.go):
1
type Addifier interface{ Add(a, b int32) int32 }
2
3
type Adder struct{ name string }
4
//go:noinline
5
func (adder Adder) Add(a, b int32) int32 { return a + b }
6
7
func main() {
8
adder := Adder{name: "myAdder"}
9
adder.Add(10, 32) // doesn't escape
10
Addifier(adder).Add(10, 32) // escapes
11
}
Copied!
1
$ GOOS=linux GOARCH=amd64 go tool compile -m escape.go
2
escape.go:13:10: Addifier(adder) escapes to heap
3
# ...
Copied!
One could even visualize the resulting heap allocation using a simple benchmark (escape_test.go):
1
func BenchmarkDirect(b *testing.B) {
2
adder := Adder{id: 6754}
3
for i := 0; i < b.N; i++ {
4
adder.Add(10, 32)
5
}
6
}
7
8
func BenchmarkInterface(b *testing.B) {
9
adder := Adder{id: 6754}
10
for i := 0; i < b.N; i++ {
11
Addifier(adder).Add(10, 32)
12
}
13
}
Copied!
1
$ GOOS=linux GOARCH=amd64 go tool compile -m escape_test.go
2
# ...
3
escape_test.go:22:11: Addifier(adder) escapes to heap
4
# ...
Copied!
1
$ GOOS=linux GOARCH=amd64 go test -bench=. -benchmem ./escape_test.go
2
BenchmarkDirect-8 2000000000 1.60 ns/op 0 B/op 0 allocs/op
3
BenchmarkInterface-8 100000000 15.0 ns/op 4 B/op 1 allocs/op
Copied!
We can clearly see how each time we create a new Addifier interface and initialize it with our adder variable, a heap allocation of sizeof(Adder) actually takes place. Later in this chapter, we'll see how even simple scalar types can lead to heap allocations when used with interfaces.
Let's turn our attention towards the next datastructure: itab.
The itab structure
itab is defined thusly (src/runtime/runtime2.go):
1
type itab struct { // 40 bytes on a 64bit arch
2
inter *interfacetype
3
_type *_type
4
hash uint32 // copy of _type.hash. Used for type switches.
5
_ [4]byte
6
fun [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
7
}
Copied!
An itab is the heart & brain of an interface.
First, it embeds a _type, which is the internal representation of any Go type within the runtime. A _type describes every facets of a type: its name, its characteristics (e.g. size, alignment...), and to some extent, even how it behaves (e.g. comparison, hashing...)! In this instance, the _type field describes the type of the value held by the interface, i.e. the value that the data pointer points to.
Second, we find a pointer to an interfacetype, which is merely a wrapper around _type with some extra information that are specific to interfaces. As you'd expect, the inter field describes the type of the interface itself.
Finally, the fun array holds the function pointers that make up the virtual/dispatch table of the interface. Notice the comment that says // variable sized, meaning that the size with which this array is declared is irrelevant. We'll see later in this chapter that the compiler is responsible for allocating the memory that backs this array, and does so independently of the size indicated here. Likewise, the runtime always accesses this array using raw pointers, thus bounds-checking does not apply here.
The _type structure
As we said above, the _type structure gives a complete description of a Go type. It's defined as such (src/runtime/type.go):
1
type _type struct { // 48 bytes on a 64bit arch
2
size uintptr
3
ptrdata uintptr // size of memory prefix holding all pointers
4
hash uint32
5
tflag tflag
6
align uint8
7
fieldalign uint8
8
kind uint8
9
alg *typeAlg
10
// gcdata stores the GC type data for the garbage collector.
11
// If the KindGCProg bit is set in kind, gcdata is a GC program.
12
// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
13
gcdata *byte
14
str nameOff
15
ptrToThis typeOff
16
}
Copied!
Thankfully, most of these fields are quite self-explanatory.
The nameOff & typeOff types are int32 offsets into the metadata embedded into the final executable by the linker. This metadata is loaded into runtime.moduledata structures at run time (src/runtime/symtab.go), which should look fairly similar if you've ever had to look at the content of an ELF file. The runtime provide helpers that implement the necessary logic for following these offsets through the moduledata structures, such as e.g. resolveNameOff (src/runtime/type.go) and resolveTypeOff (src/runtime/type.go):
1
func resolveNameOff(ptrInModule unsafe.Pointer, off nameOff) name {}
2
func resolveTypeOff(ptrInModule unsafe.Pointer, off typeOff) *_type {}
Copied!
I.e., assuming t is a _type, calling resolveTypeOff(t, t.ptrToThis) returns a copy of t.
The interfacetype structure
Finally, here's the interfacetype structure (src/runtime/type.go):
1
type interfacetype struct { // 80 bytes on a 64bit arch
2
typ _type
3
pkgpath name
4
mhdr []imethod
5
}
6
7
type imethod struct {
8
name nameOff
9
ityp typeOff
10
}
Copied!
As mentioned, an interfacetype is just a wrapper around a _type with some extra interface-specific metadata added on top. In the current implementation, this metadata is mostly composed of a list of offsets that points to the respective names and types of the methods exposed by the interface ([]imethod).
Conclusion
Here's an overview of what an iface looks like when represented with all of its sub-types inlined; this hopefully should help connect all the dots:
1
type iface struct { // `iface`
2
tab *struct { // `itab`
3
inter *struct { // `interfacetype`
4
typ struct { // `_type`
5
size uintptr
6
ptrdata uintptr
7
hash uint32
8
tflag tflag
9
align uint8
10
fieldalign uint8
11
kind uint8
12
alg *typeAlg
13
gcdata *byte
14
str nameOff
15
ptrToThis typeOff
16
}
17
pkgpath name
18
mhdr []struct { // `imethod`
19
name nameOff
20
ityp typeOff
21
}
22
}
23
_type *struct { // `_type`
24
size uintptr
25
ptrdata uintptr
26
hash uint32
27
tflag tflag
28
align uint8
29
fieldalign uint8
30
kind uint8
31
alg *typeAlg
32
gcdata *byte
33
str nameOff
34
ptrToThis typeOff
35
}
36
hash uint32
37
_ [4]byte
38
fun [1]uintptr
39
}
40
data unsafe.Pointer
41
}
Copied!
This section glossed over the different data-types that make up an interface to help us to start building a mental model of the various cogs involved in the overall machinery, and how they all work with each other. In the next section, we'll learn how these datastructures actually get computed.

Creating an interface

Now that we've had a quick look at all the datastructures involved, we'll focus on how they actually get allocated and initiliazed.
Consider the following program (iface.go):
1
type Mather interface {
2
Add(a, b int32) int32
3
Sub(a, b int64) int64
4
}
5
6
type Adder struct{ id int32 }
7
//go:noinline
8
func (adder Adder) Add(a, b int32) int32 { return a + b }
9
//go:noinline
10
func (adder Adder) Sub(a, b int64) int64 { return a - b }
11
12
func main() {
13
m := Mather(Adder{id: 6754})
14
15
// This call just makes sure that the interface is actually used.
16
// Without this call, the linker would see that the interface defined above
17
// is in fact never used, and thus would optimize it out of the final
18
// executable.
19
m.Add(10, 32)
20
}
Copied!
NOTE: For the remainder of this chapter, we will denote an interface I that holds a type T as <I,T>. E.g. Mather(Adder{id: 6754}) instantiates an iface<Mather, Adder>.
Let's zoom in on the instantiation of iface<Mather, Adder>:
1
m := Mather(Adder{id: 6754})
Copied!
This single line of Go code actually sets off quite a bit of machinery, as the assembly listing generated by the compiler can attest:
1
;; part 1: allocate the receiver
2
0x001d MOVL $6754, ""..autotmp_1+36(SP)
3
;; part 2: set up the itab
4
0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX
5
0x002c MOVQ AX, (SP)
6
;; part 3: set up the data
7
0x0030 LEAQ ""..autotmp_1+36(SP), AX
8
0x0035 MOVQ AX, 8(SP)
9
0x003a CALL runtime.convT2I32(SB)
10
0x003f MOVQ 16(SP), AX
11
0x0044 MOVQ 24(SP), CX
Copied!
As you can see, we've splitted the output into three logical parts.
Part 1: Allocate the receiver
1
0x001d MOVL $6754, ""..autotmp_1+36(SP)
Copied!
A constant decimal value of 6754, corresponding to the ID of our Adder, is stored at the beginning of the current stack-frame. It's stored there so that the compiler will later be able to reference it by its address; we'll see why in part 3.
Part 2: Set up the itab
1
0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX
2
0x002c MOVQ AX, (SP)
Copied!
It looks like the compiler has already created the necessary itab for representing our iface<Mather, Adder> interface, and made it available to us via a global symbol: go.itab."".Adder,"".Mather.
We're in the process of building an iface<Mather, Adder> interface and, in order to do so, we're loading the effective address of this global go.itab."".Adder,"".Mather symbol at the top of the current stack-frame. Once again, we'll see why in part 3.
Semantically, this gives us something along the lines of the following pseudo-code:
1
tab := getSymAddr(`go.itab.main.Adder,main.Mather`).(*itab)
Copied!
That's half of our interface right there!
Now, while we're at it, let's have a deeper look at that go.itab."".Adder,"".Mather symbol. As usual, the -S flag of the compiler can tell us a lot:
1
$ GOOS=linux GOARCH=amd64 go tool compile -S iface.go | grep -A 7 '^go.itab."".Adder,"".Mather'
2
go.itab."".Adder,"".Mather SRODATA dupok size=40
3
0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
4
0x0010 8a 3d 5f 61 00 00 00 00 00 00 00 00 00 00 00 00 .=_a............
5
0x0020 00 00 00 00 00 00 00 00 ........
6
rel 0+8 t=1 type."".Mather+0
7
rel 8+8 t=1 type."".Adder+0
8
rel 24+8 t=1 "".(*Adder).Add+0
9
rel 32+8 t=1 "".(*Adder).Sub+0
Copied!
Neat. Let's analyze this piece by piece.
The first piece declares the symbol and its attributes:
1
go.itab."".Adder,"".Mather SRODATA dupok size=40
Copied!
As usual, since we're looking directly at the intermediate object file generated by the compiler (i.e. the linker hasn't run yet), symbol names are still missing package names. Nothing new on that front. Other than that, what we've got here is a 40-byte global object symbol that will be stored in the .rodata section of our binary.
Note the dupok directive, which tells the linker that it is legal for this symbol to appear multiple times at link-time: the linker will have to arbitrarily choose one of them over the others. What makes the Go authors think that this symbol might end up duplicated, I'm not sure. Feel free to file an issue if you know more. [UPDATE: We've discussed about this matter in issue #7: How you can get duplicated go.itab interface definitions.]
The second piece is a hexdump of the 40 bytes of data associated with the symbol. I.e., it's a serialized representation of an itab structure:
1
0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
2
0x0010 8a 3d 5f 61 00 00 00 00 00 00 00 00 00 00 00 00 .=_a............
3
0x0020 00 00 00 00 00 00 00 00 ........
Copied!
As you can see, most of this data is just a bunch of zeros at this point. The linker will take care of filling them up, as we'll see in a minute.
Notice how, among all these zeros, 4 bytes actually have been set though, at offset 0x10+4. If we take a look back at the declaration of the itab structure and annotate the respective offsets of its fields:
1
type itab struct { // 40 bytes on a 64bit arch
2
inter *interfacetype // offset 0x00 ($00)
3
_type *_type // offset 0x08 ($08)
4
hash uint32 // offset 0x10 ($16)
5
_ [4]byte // offset 0x14 ($20)
6
fun [1]uintptr // offset 0x18 ($24)
7
// offset 0x20 ($32)
8
}
Copied!
We see that offset 0x10+4 matches the hash uint32 field: i.e., the hash value that corresponds to our main.Adder type is already right there in our object file.
The third and final piece lists a bunch of relocation directives for the linker:
1
rel 0+8 t=1 type."".Mather+0
2
rel 8+8 t=1 type."".Adder+0
3
rel 24+8 t=1 "".(*Adder).Add+0
4
rel 32+8 t=1 "".(*Adder).Sub+0
Copied!
rel 0+8 t=1 type."".Mather+0 tells the linker to fill up the first 8 bytes (0+8) of the contents with the address of the global object symbol type."".Mather. rel 8+8 t=1 type."".Adder+0 then fills the next 8 bytes with the address of type."".Adder, and so on and so forth.
Once the linker has done its job and followed all of these directives, our 40-byte serialized itab will be complete. Overall, we're now looking at something akin to the following pseudo-code:
1
tab := getSymAddr(`go.itab.main.Adder,main.Mather`).(*itab)
2
3
// NOTE: The linker strips the `type.` prefix from these symbols when building
4
// the executable, so the final symbol names in the .rodata section of the
5
// binary will actually be `main.Mather` and `main.Adder` rather than
6
// `type.main.Mather` and `type.main.Adder`.
7
// Don't get tripped up by this when toying around with objdump.
8
tab.inter = getSymAddr(`type.main.Mather`).(*interfacetype)
9
tab._type = getSymAddr(`type.main.Adder`).(*_type)
10
11
tab.fun[0] = getSymAddr(`main.(*Adder).Add`).(uintptr)
12
tab.fun[1] = getSymAddr(`main.(*Adder).Sub`).(uintptr)
Copied!
We've got ourselves a ready-to-use itab, now if we just had some data to along with it, that'd make for a nice, complete interface.
Part 3: Set up the data
1
0x0030 LEAQ ""..autotmp_1+36(SP), AX
2
0x0035 MOVQ AX, 8(SP)
3
0x003a CALL runtime.convT2I32(SB)
4
0x003f MOVQ 16(SP), AX
5
0x0044 MOVQ 24(SP), CX
Copied!
Remember from part 1 that the top of the stack (SP) currently holds the address of go.itab."".Adder,"".Mather (argument #1). Also remember from part 2 that we had stored a $6754 decimal constant in ""..autotmp_1+36(SP): we now load the effective address of this constant just below the top of the stack-frame, at 8(SP) (argument #2).
These two pointers are the two arguments that we pass into runtime.convT2I32, which will apply the final touches of glue to create and return our complete interface. Let's have a closer look at it (src/runtime/iface.go):
1
func convT2I32(tab *itab, elem unsafe.Pointer) (i iface) {
2
t := tab._type
3
/* ...omitted debug stuff... */
4
var x unsafe.Pointer
5
if *(*uint32)(elem) == 0 {
6
x = unsafe.Pointer(&zeroVal[0])
7
} else {
8
x = mallocgc(4, t, false)
9
*(*uint32)(x) = *(*uint32)(elem)
10
}
11
i.tab = tab
12
i.data = x
13
return
14
}
Copied!
So runtime.convT2I32 does 4 things: 1. It creates a new iface structure i (to be pedantic, its caller creates it.. same difference). 2. It assigns the itab pointer we just gave it to i.tab. 3. It allocates a new object of type i.tab._type on the heap, then copy the value pointed to by the second argument elem into that new object. 4. It returns the final interface.
This process is quite straightforward overall, although the 3rd step does involve some tricky implementation details in this specific case, which are caused by the fact that our Adder type is effectively a scalar type. We'll look at the interactions of scalar types and interfaces in more details in the section about the special cases of interfaces.
Conceptually, we've now accomplished the following (pseudo-code):
1
tab := getSymAddr(`go.itab.main.Adder,main.Mather`).(*itab)
2
elem := getSymAddr(`""..autotmp_1+36(SP)`).(*int32)
3
4
i := runtime.convTI32(tab, unsafe.Pointer(elem))
5
6
assert(i.tab == tab)
7
assert(*(*int32)(i.data) == 6754) // same value..
8
assert((*int32)(i.data) != elem) // ..but different (al)locations!
Copied!
To summarize all that just went down, here's a complete, annotated version of the assembly code for all 3 parts:
1
0x001d MOVL $6754, ""..autotmp_1+36(SP) ;; create an addressable $6754 value at 36(SP)
2
0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX ;; set up go.itab."".Adder,"".Mather..
3
0x002c MOVQ AX, (SP) ;; ..as first argument (tab *itab)
4
0x0030 LEAQ ""..autotmp_1+36(SP), AX ;; set up &36(SP)..
5
0x0035 MOVQ AX, 8(SP) ;; ..as second argument (elem unsafe.Pointer)
6
0x003a CALL runtime.convT2I32(SB) ;; call convT2I32(go.itab."".Adder,"".Mather, &$6754)
7
0x003f MOVQ 16(SP), AX ;; AX now holds i.tab (go.itab."".Adder,"".Mather)
8
0x0044 MOVQ 24(SP), CX ;; CX now holds i.data (&$6754, somewhere on the heap)
Copied!
Keep in mind that all of this started with just one single line: m := Mather(Adder{id: 6754}).
We finally got ourselves a complete, working interface.

Reconstructing an itab from an executable

In the previous section, we dumped the contents of go.itab."".Adder,"".Mather directly from the object files generated by the compiler and ended up looking at what was mostly a blob of zeros (except for the hash value):
1
$ GOOS=linux GOARCH=amd64 go tool compile -S iface.go | grep -A 3 '^go.itab."".Adder,"".Mather'
2
go.itab."".Adder,"".Mather SRODATA dupok size=40
3
0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
4
0x0010 8a 3d 5f 61 00 00 00 00 00 00 00 00 00 00 00 00 .=_a............
5
0x0020 00 00 00 00 00 00 00 00 ........
Copied!
To get a better picture of how the data is laid out into the final executable produced by the linker, we'll walk through the generated ELF file and manually reconstruct the bytes that make up the itab of our iface<Mather, Adder>. Hopefully, this'll enable us to observe what our itab looks like once the linker has done its job.
First things first, let's build the iface binary: GOOS=linux GOARCH=amd64 go build -o iface.bin iface.go.
Step 1: Find .rodata
Let's print the section headers in search of .rodata, readelf can help with that:
1
$ readelf -St -W iface.bin
2
There are 22 section headers, starting at offset 0x190:
3
4
Section Headers:
5
[Nr] Name
6
Type Address Off Size ES Lk Inf Al
7
Flags
8
[ 0]
9
NULL 0000000000000000 000000 000000 00 0 0 0
10
[0000000000000000]:
11
[ 1] .text
12
PROGBITS 0000000000401000 001000 04b3cf 00 0 0 16
13
[0000000000000006]: ALLOC, EXEC
14
[ 2] .rodata
15
PROGBITS 000000000044d000 04d000 028ac4 00 0 0 32
16
[0000000000000002]: ALLOC
17
## ...omitted rest of output...
Copied!
What we really need here is the (decimal) offset of the section, so let's apply some pipe-foo:
1
$ readelf -St -W iface.bin | \
2
grep -A 1 .rodata | \
3
tail -n +2 | \
4
awk '{print "ibase=16;"toupper($3)}' | \
5
bc
6
315392
Copied!
Which means that fseek-ing 315392 bytes into our binary should place us right at the start of the .rodata section. Now what we need to do is map this file location to a virtual-memory address.
Step 2: Find the virtual-memory address (VMA) of .rodata
The VMA is the virtual address at which the section will be mapped once the binary has been loaded in memory by the OS. I.e., this is the address that we'll use to reference a symbol at runtime.
The reason we care about the VMA in this case is that we cannot directly ask readelf or objdump for the offset of a specific symbol (AFAIK). What we can do, on the other hand, is ask for the VMA of a specific symbol. Coupled with some simple maths, we should be able to build a mapping between VMAs and offsets and finally find the offsets of the symbols that we're looking for.
Finding the VMA of .rodata is no different than finding its offset, it's just a different column is all:
1
$ readelf -St -W iface.bin | \
2
grep -A 1 .rodata | \
3
tail -n +2 | \
4
awk '{print "ibase=16;"toupper($2)}' | \
5
bc
6
4509696
Copied!
So here's what we know so far: the .rodata section is located at offset $315392 (= 0x04d000) into the ELF file, which will be mapped at virtual address $4509696 (= 0x44d000) at run time.
Now we need the VMA as well as the size of the symbol we're looking for:
  • Its VMA will (indirectly) allow us to locate it within the executable.
  • Its size will tell us how much data to extract once we've found the correct offset.
Step 3: Find the VMA & size of go.itab.main.Adder,main.Mather
objdump has those available for us.
First, find the symbol:
1
$ objdump -t -j .rodata iface.bin | grep "go.itab.main.Adder,main.Mather"
2
0000000000475140 g O .rodata 0000000000000028 go.itab.main.Adder,main.Mather
Copied!
Then, get its VMA in decimal form:
1
$ objdump -t -j .rodata iface.bin | \
2
grep "go.itab.main.Adder,main.Mather" | \
3
awk '{print "ibase=16;"toupper($1)}' | \
4
bc
5
4673856
Copied!
And finally, get its size in decimal form:
1
$ objdump -t -j .rodata iface.bin | \
2
grep "go.itab.main.Adder,main.Mather" | \
3
awk '{print "ibase=16;"toupper($5)}' | \
4
bc
5
40
Copied!
So go.itab.main.Adder,main.Mather will be mapped at virtual address $4673856 (= 0x475140) at run time, and has a size of 40 bytes (which we already knew, as it's the size of an itab structure).
Step 4: Find & extract go.itab.main.Adder,main.Mather
We now have all the elements we need in order to locate go.itab.main.Adder,main.Mather within our binary.
Here's a reminder of what we know so far:
1
.rodata offset: 0x04d000 == $315392
2
.rodata VMA: 0x44d000 == $4509696
3
4
go.itab.main.Adder,main.Mather VMA: 0x475140 == $4673856
5
go.itab.main.Adder,main.Mather size: 0x24 = $40
Copied!
If $315392 (.rodata's offset) maps to $4509696 (.rodata's VMA) and go.itab.main.Adder,main.Mather's VMA is $4673856, then go.itab.main.Adder,main.Mather's offset within the executable is: sym.offset = sym.vma - section.vma + section.offset = $4673856 - $4509696 + $315392 = $479552.
Now that we know both the offset and size of the data, we can take out good ol' dd and extract the raw bytes straight out of the executable:
1
$ dd if=iface.bin of=/dev/stdout bs=1 count=40 skip=479552 2>/dev/null | hexdump
2
0000000 bd20 0045 0000 0000 ed40 0045 0000 0000
3
0000010 3d8a 615f 0000 0000 c2d0 0044 0000 0000
4
0000020 c350 0044 0000 0000
5
0000028
Copied!
This certainly does look like a clear-cut victory.. but is it, really? Maybe we've just dumped 40 totally random, unrelated bytes? Who knows? There's at least one way to be sure: let's compare the type hash found in our binary dump (at offset 0x10+4 -> 0x615f3d8a) with the one loaded by the runtime (iface_type_hash.go):
1
// simplified definitions of runtime's iface & itab types
2
type iface struct {
3
tab *itab
4
data unsafe.Pointer
5
}
6
type itab struct {
7
inter uintptr
8
_type uintptr
9
hash uint32
10
_ [4]byte
11
fun [1]uintptr
12
}
13
14
func main() {
15
m := Mather(Adder{id: 6754})
16
17
iface := (*iface)(unsafe.Pointer(&m))
18
fmt.Printf("iface.tab.hash = %#x\n", iface.tab.hash) // 0x615f3d8a
19
}
Copied!
It's a match! fmt.Printf("iface.tab.hash = %#x\n", iface.tab.hash) gives us 0x615f3d8a, which corresponds to the value that we've extracted from the contents of the ELF file.
Conclusion
We've reconstructed the complete itab for our iface<Mather, Adder> interface; it's all there in the executable, just waiting to be used, and already contains all the information that the runtime will need to make the interface behave as we expect.
Of course, since an itab is mostly composed of a bunch of pointers to other datastructures, we'd have to follow the virtual addresses present in the contents that we've extracted via dd in order to reconstruct the complete picture. Speaking of pointers, we can now have a clear view of the virtual-table for iface<Mather, Adder>; here's an annotated version of the contents of go.itab.main.Adder,main.Mather:
1
$ dd if=iface.bin of=/dev/stdout bs=1 count=40 skip=479552 2>/dev/null | hexdump
2
0000000 bd20 0045 0000 0000 ed40 0045 0000 0000
3
0000010 3d8a 615f 0000 0000 c2d0 0044 0000 0000
4
# ^^^^^^^^^^^^^^^^^^^
5
# offset 0x18+8: itab.fun[0]
6
0000020 c350 0044 0000 0000
7
# ^^^^^^^^^^^^^^^^^^^
8
# offset 0x20+8: itab.fun[1]
9
0000028
Copied!
1
$ objdump -t -j .text iface.bin | grep 000000000044c2d0
2
000000000044c2d0 g F .text 0000000000000079 main.(*Adder).Add
Copied!
1
$ objdump -t -j .text iface.bin | grep 000000000044c350
2
000000000044c350 g F .text 000000000000007f main.(*Adder).Sub
Copied!
Without surprise, the virtual table for iface<Mather, Adder> holds two method pointers: main.(*Adder).add and main.(*Adder).sub. Well, actually, this is a bit surprising: we've never defined these two methods to have pointer receivers. The compiler has generated these wrapper methods on our behalf (as we've described in the "Implicit dereferencing" section) because it knows that we're going to need them: since an interface can only hold pointers, and since our Adder implementation of said interface only provides methods with value-receivers, we'll have to go through a wrapper at some point if we're going to call either of these methods via the virtual table of the interface.
This should already give you a pretty good idea of how dynamic dispatch is handled at run time; which is what we will look at in the next section.
Bonus
I've hacked up a generic bash script that you can use to dump the contents of any symbol in any section of an ELF file (dump_sym.sh):
1
# ./dump_sym.sh bin_path section_name sym_name
2
$ ./dump_sym.sh iface.bin .rodata go.itab.main.Adder,main.Mather
3
.rodata file-offset: 315392
4
.rodata VMA: 4509696
5
go.itab.main.Adder,main.Mather VMA: 4673856
6
go.itab.main.Adder,main.Mather SIZE: 40
7
8
0000000 bd20 0045 0000 0000 ed40 0045 0000 0000
9
0000010 3d8a 615f 0000 0000 c2d0 0044 0000 0000
10
0000020 c350 0044 0000 0000
11
0000028
Copied!
I'd imagine there must exist an easier way to do what this script does, maybe some arcane flags or an obscure gem hidden inside the binutils distribution.. who knows. If you've got some hints, don't hesitate to say so in the issues.

Dynamic dispatch

In this section we'll finally cover the main feature of interfaces: dynamic dispatch. Specifically, we'll look at how dynamic dispatch works under the hood, and how much we got to pay for it.

Indirect method call on interface

Let's have a look back at our code from earlier (iface.go):
1
type Mather interface {
2
Add(a, b int32) int32
3
Sub(a, b int64) int64
4
}
5
6
type Adder struct{ id int32 }
7
//go:noinline
8
func (adder Adder) Add(a, b int32) int32 { return a + b }
9
//go:noinline
10
func (adder Adder) Sub(a, b int64) int64 { return a - b }
11
12
func main() {
13
m := Mather(Adder{id: 6754})
14
m.Add(10, 32)
15
}
Copied!
We've already had a deeper look into most of what happens in this piece of code: how the iface<Mather, Adder> interface gets created, how it's laid out in the final exectutable, and how it ends up being loaded by the runtime. There's only one thing left for us to look at, and that is the actual indirect method call that follows: m.Add(10, 32).
To refresh our memory, we'll zoom in on both the creation of the interface as well as on the method call itself:
1
m := Mather(Adder{id: 6754})
2
m.Add(10, 32)
Copied!
Thankfully, we already have a fully annotated version of the assembly generated by the instantiation done on the first line (m := Mather(Adder{id: 6754})):
1
;; m := Mather(Adder{id: 6754})
2
0x001d MOVL $6754, ""..autotmp_1+36(SP) ;; create an addressable $6754 value at 36(SP)
3
0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX ;; set up go.itab."".Adder,"".Mather..
4
0x002c MOVQ AX, (SP) ;; ..as first argument (tab *itab)
5
0x0030 LEAQ ""..autotmp_1+36(SP), AX ;; set up &36(SP)..
6
0x0035 MOVQ AX, 8(SP) ;; ..as second argument (elem unsafe.Pointer)
7
0x003a CALL runtime.convT2I32(SB) ;; runtime.convT2I32(go.itab."".Adder,"".Mather, &$6754)
8
0x003f MOVQ 16(SP), AX ;; AX now holds i.tab (go.itab."".Adder,"".Mather)
9
0x0044 MOVQ 24(SP), CX ;; CX now holds i.data (&$6754, somewhere on the heap)
Copied!
And now, here's the assembly listing for the indirect method call that follows (m.Add(10, 32)):
1
;; m.Add(10, 32)
2
0x0049 MOVQ 24(AX), AX
3
0x004d MOVQ $137438953482, DX
4
0x0057 MOVQ DX, 8(SP)
5
0x005c MOVQ CX, (SP)
6
0x0060 CALL AX
Copied!
With the knowledge accumulated from the previous sections, these few instructions should be straightforward to understand.
1
0x0049 MOVQ 24(AX), AX
Copied!
Once runtime.convT2I32 has returned, AX holds i.tab, which as we know is a pointer to an itab; and more specifically a pointer to go.itab."".Adder,"".Mather in this case. By dereferencing AX and offsetting 24 bytes forward, we reach i.tab.fun, which corresponds to the first entry of the virtual table. Here's a reminder of what the offset table for itab looks like:
1
type itab struct { // 32 bytes on a 64bit arch
2
inter *interfacetype // offset 0x00 ($00)
3
_type *_type // offset 0x08 ($08)
4
hash uint32 // offset 0x10 ($16)
5
_ [4]byte // offset 0x14 ($20)
6
fun [1]uintptr // offset 0x18 ($24)
7
// offset 0x20 ($32)
8
}
Copied!
As we've seen in the previous section where we've reconstructed the final itab directly from the executable, iface.tab.fun[0] is a pointer to main.(*Adder).add, which is the compiler-generated wrapper-method that wraps our original value-receiver main.Adder.add method.
1
0x004d MOVQ $137438953482, DX
2
0x0057 MOVQ DX, 8(SP)
Copied!
We store 10 and 32 at the top of the stack, as arguments #2 & #3.
1
0x005c MOVQ CX, (SP)
2
0x0060 CALL AX
Copied!
Once runtime.convT2I32 has returned, CX holds i.data, which is a pointer to our Adder instance. We move this pointer to the top of stack, as argument #1, to satisfy the calling convention: the receiver for a method should always be passed as the first argument.
Finally, with our stack all set up, we can do the actual call.
We'll close this section with a complete annotated assembly listing of the entire process:
1
;; m := Mather(Adder{id: 6754})
2
0x001d MOVL $6754, ""..autotmp_1+36(SP) ;; create an addressable $6754 value at 36(SP)
3
0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX ;; set up go.itab."".Adder,"".Mather..
4
0x002c MOVQ AX, (SP) ;; ..as first argument (tab *itab)
5
0x0030 LEAQ ""..autotmp_1+36(SP), AX ;; set up &36(SP)..
6
0x0035 MOVQ AX, 8(SP) ;; ..as second argument (elem unsafe.Pointer)
7
0x003a CALL runtime.convT2I32(SB) ;; runtime.convT2I32(go.itab."".Adder,"".Mather, &$6754)
8
0x003f MOVQ 16(SP), AX ;; AX now holds i.tab (go.itab."".Adder,"".Mather)
9
0x0044 MOVQ 24(SP), CX ;; CX now holds i.data (&$6754, somewhere on the heap)
10
;; m.Add(10, 32)
11
0x0049 MOVQ 24(AX), AX ;; AX now holds (*iface.tab)+0x18, i.e. iface.tab.fun[0]
12
0x004d MOVQ $137438953482, DX ;; move (32,10) to..
13
0x0057 MOVQ DX, 8(SP) ;; ..the top of the stack (arguments #3 & #2)
14
0x005c MOVQ CX, (SP) ;; CX, which holds &$6754 (i.e., our receiver), gets moved to
15
;; ..the top of stack (argument #1 -> receiver)
16
0x0060 CALL AX ;; you know the drill
Copied!
We now have a clear picture of the entire machinery required for interfaces and virtual method calls to work. In the next section, we'll measure the actual cost of this machinery, in theory as well as in practice.

Overhead

As we've seen, the implementation of interfaces delegates most of the work on both the compiler and the linker. From a performance standpoint, this is obviously very good news: we effectively want to relieve the runtime from as much work as possible. There do exist some specific cases where instantiating an interface may also require the runtime to get to work (e.g. the runtime.convT2* family of functions), though they are not so prevalent in practice. We'll learn more about these edge cases in the section dedicated to the special cases of interfaces. In the meantime, we'll concentrate purely on the overhead of virtual method calls and ignore the one-time costs related to instantiation.
Once an interface has been properly instantiated, calling methods on it is nothing more than going through one more layer of indirection compared to the usual statically dispatched call (i.e. dereferencing itab.fun at the desired index). As such, one would imagine this process to be virtually free.. and one would be kind of right, but not quite: the theory is a bit tricky, and the reality even trickier still.
The theory: quick refresher on modern CPUs
The extra indirection inherent to virtual calls is, in and of itself, effectively free for as long as it is somewhat predictable from the standpoint of the CPU. Modern CPUs are very aggressive beasts: they cache aggressively, they aggressively pre-fetch both instructions & data, they pre-execute code aggressively, they even reorder and parallelize it as they see fit. All of this extra work is done whether we want it or not, hence we should always strive not to get in the way of the CPU's efforts to be extra smart, so all of these precious cycles don't go needlessly wasted.
This is where virtual method calls can quickly become a problem.
In the case of a statically dispatched call, the CPU has foreknowledge of the upcoming branch in the program and pre-fetches the necessary instructions accordingly. This makes up for a smooth, transparent transition from one branch of the program to the other as far as performance is concerned. With dynamic dispatch, on the other hand, the CPU cannot know in advance where the program is heading: it all depends on computations whose results are, by definition, not known until run time. To counter-balance this, the CPU applies various algorithms and heuristics in order to guess where the program is going to branch next (i.e. "branch prediction").
If the processor guesses correctly, we can expect a dynamic branch to be almost as efficient as a static one, since the instructions of the landing site have already been pre-fetched into the processor's caches anyway.
If it gets things wrong, though, things can get a bit rough: first, of course, we'll have to pay for the extra indirection plus the corresponding (slow) load from main memory (i.e. the CPU is effectively stalled) to load the right instructions into the L1i cache. Even worse, we'll have to pay for the price of the CPU backtracking in its own mistakes and flushing its instruction pipeline following the branch misprediction. Another important downside of dynamic dispatch is that it makes inlining impossible by definition: one simply cannot inline what they don't know is coming.
All in all, it should, at least in theory, be very possible to end up with massive differences in performance between a direct call to an inlined function F, and a call to that same function that couldn't be inlined and had to go through some extra layers of indirection, and maybe even got hit by a branch misprediction on its way.
That's mostly it for the theory. When it comes to modern hardware, though, one should always be wary of the theory.
Let's measure this stuff.
The practice: benchmarks
First things first, some information about the CPU we're running on:
1
$ lscpu | sed -nr '/Model name/ s/.*:\s*(.* @ .*)/\1/p'
2
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Copied!
We'll define the interface used for our benchmarks as such (iface_bench_test.go):
1
type identifier interface {
2
idInline() int32
3
idNoInline() int32
4
}
5
6
type id32 struct{ id int32 }
7
8
// NOTE: Use pointer receivers so we don't measure the extra overhead incurred by
9
// autogenerated wrappers as part of our results.
10
11
func (id *id32) idInline() int32 { return id.id }
12
//go:noinline
13
func (id *id32) idNoInline() int32 { return id.id }
Copied!
Benchmark suite A: single instance, many calls, inlined & non-inlined
For our first two benchmarks, we'll try calling a non-inlined method inside a busy-loop, on both an *Adder value and a iface<Mather, *Adder> interface:
1
var escapeMePlease *id32
2
// escapeToHeap makes sure that `id` escapes to the heap.
3
//
4
// In simple situations such as some of the benchmarks present in this file,
5
// the compiler is able to statically infer the underlying type of the
6
// interface (or rather the type of the data it points to, to be pedantic) and
7
// ends up replacing what should have been a dynamic method call by a
8
// static call.
9
// This anti-optimization prevents this extra cleverness.
10
//
11
//go:noinline
12
func escapeToHeap(id *id32) identifier {
13
escapeMePlease = id
14
return escapeMePlease
15
}
16
17
var myID int32
18
19
func BenchmarkMethodCall_direct(b *testing.B) {
20
b.Run("single/noinline", func(b *testing.B) {
21
m := escapeToHeap(&id32{id: 6754}).(*id32)
22
for i := 0; i < b.N; i++ {
23
// CALL "".(*id32).idNoInline(SB)
24
// MOVL 8(SP), AX
25
// MOVQ "".&myID+40(SP), CX
26
// MOVL AX, (CX)
27
myID = m.idNoInline()
28
}
29
})
30
}
31
32
func BenchmarkMethodCall_interface(b *testing.B) {
33
b.Run("single/noinline", func(b *testing.B) {
34
m := escapeToHeap(&id32{id: 6754})
35
for i := 0; i < b.N; i++ {
36
// MOVQ 32(AX), CX
37
// MOVQ "".m.data+40(SP), DX
38
// MOVQ DX, (SP)
39
// CALL CX
40
// MOVL 8(SP), AX
41
// MOVQ "".&myID+48(SP), CX
42
// MOVL AX, (CX)
43
myID = m.idNoInline()
44
}
45
})
46
}
Copied!
We expect both benchmarks to run A) extremely fast and B) at almost the same speeds.
Given the tightness of the loop, we can expect both benchmarks to have their data (receiver & vtable) and instructions ("".(*id32).idNoInline) already be present in the L1d/L1i caches of the CPU for each iteration of the loop. I.e., performance should be purely CPU-bound.
BenchmarkMethodCall_interface should run a bit slower (on the nanosecond scale) though, as it has to deal with the overhead of finding & copying the right pointer from the virtual table (which is already in the L1 cache, though). Since the CALL CX instruction has a strong dependency on the output of these few extra instructions required to consult the vtable, the processor has no choice but to execute all of this extra logic as a sequential stream, leaving any chance of instruction-level parallelization on the table. This is ultimately the main reason why we would expect the "interface" version to run a bit slower.
We end up with the following results for the "direct" version:
1
$ go test -run=NONE -o iface_bench_test.bin iface_bench_test.go && \
2
perf stat --cpu=1 \
3
taskset 2 \
4
./iface_bench_test.bin -test.cpu=1 -test.benchtime=1s -test.count=3 \
5
-test.bench='BenchmarkMethodCall_direct/single/noinline'
6
BenchmarkMethodCall_direct/single/noinline 2000000000