Alberto Ventafridda
Written on

An interactive guide to x86-64 assembly - moving data

This is the second part of a series of interactive articles on the x86-64 architecture. This part will focus on the first assembly instructions, visualizing the way data moves in memory when they are executed.

Visualizing memory

In the previous post we introduced some basics on data, encodings, and the places where data is stored: registers and memory. We also introduced a common way to visualize memory, that will be used extensively in this article: hex dumps.

The example below shows a hexdump of some example data taken from the stack frame of a process. Use the slider to adjust the number of bytes you want to see in a single row.

showing 1 byte per row

00000000
00000001
00000002
00000003
00000004
00000005
00000006
00000007
00000008
00000009
0000000a
0000000b
0000000c
0000000d
0000000e
0000000f
00000010
00000011
00000012
00000013
00000014
00000015
00000016
00000017
00000018
00000019
0000001a
0000001b
0000001c
0000001d
0000001e
0000001f
00000020
00000021
00000022
00000023
00000024
00000025
00000026
00000027
00000028
00000029
0000002a
0000002b
0000002c
0000002d
0000002e
0000002f
00000030
00000031
00000032
00000033
00000034
00000035
00000036
00000037
00000038
00000039
0000003a
0000003b
0000003c
0000003d
0000003e
0000003f
00000040
00000041
00000042
00000043
00000044
00000045
00000046
00000047
00000048
00000049
0000004a
0000004b
0000004c
0000004d
0000004e
0000004f
00000050
00000051
00000052
00000053
00000054
00000055
00000056
00000057
00000058
00000059
0000005a
0000005b
0000005c
0000005d
0000005e
0000005f
00000060
00000061
00000062
00000063
00000064
00000065
00000066
00000067
00000068
00000069
0000006a
0000006b
0000006c
0000006d
0000006e
0000006f
00000070
00000071
00000072
00000073
00000074
00000075
00000076
00000077
00000078
00000079
0000007a
0000007b
0000007c
0000007d
0000007e
0000007f
00000080
00000081
00000082
00000083
00000084
00000085
00000086
00000087
00000088
00000089
0000008a
0000008b
0000008c
0000008d
0000008e
0000008f
00000090
00000091
00000092
00000093
00000094
00000095
00000096
00000097
00000098
00000099
0000009a
0000009b
0000009c
0000009d
0000009e
0000009f
000000a0
000000a1
000000a2
000000a3
000000a4
000000a5
000000a6
000000a7
000000a8
000000a9
000000aa
000000ab
000000ac
000000ad
000000ae
000000af
000000b0
000000b1
000000b2
000000b3
000000b4
000000b5
000000b6
000000b7
000000b8
000000b9
000000ba
000000bb
000000bc
000000bd
000000be
000000bf
000000c0
000000c1
000000c2
000000c3
000000c4
000000c5
000000c6
000000c7
000000c8
000000c9
000000ca
000000cb
000000cc
000000cd
000000ce
000000cf
000000d0
000000d1
000000d2
000000d3
000000d4
000000d5
000000d6
000000d7
000000d8
000000d9
000000da
000000db
000000dc
000000dd
000000de
000000df
6578616d706c652061736369692074657874000000000000e95155555555000040dcffff0100000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000
example ascii text.......QUUUU..@.......X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

The reason I want you to familiarize with this visualization is also the rationale behind this series of articles: Most resources online explain low-level topics (such as stack frames, data alignment, or buffer overflows) using abstract diagrams. But when you will approach these topics in practice, you will use tools like gdb, that visualize data in a completely different way compared to the diagrams.
For example, this is a screenshot of my setup when running gdb with GEF and the python pwntools library:

Screenshot of a tmux terminal split vertically into two panes. The pane on the left is an interactive python shell. It has received the input: p.send(bytearray(key)). The pane on the right is a gdb session whth the gef plugin enabled. You can see registers, stack and current instruction for a process called ./babyrev_level6.0. The program is about to run a call to glibc readline

All the visualizations in this article emulate the way data is visualized in real-world scenarios, with tools like gdb or PWNDBG, popular in CTF competitions. My hope is that this will lower the steep learning curve of those tools.

Moving data

The first instruction we are going to see is mov, which moves data around. It can move data from a register to another, from a register to memory, or vice-versa from memory to a register

These first examples are self-explanatory:

mov rbx, 0x10  ;copies the integer 0x10 into rbx
mov rax, rbx   ;copies the content of rbx into rax

Moving data to memory requires some extra syntax:
The following snippet writes the byte 0xff in the memory cell at address 0x10.

mov rax, 0x10
mov byte ptr [rax], 0xff

Let’s break it down:

  • First, we put in a register 0x10, the address of the cell we want to write to.
  • Then we perform a mov instruction with square brackets around the register name, to indicate that we want to move 0xff in the memory address pointed by the register, and not into the register itself.

Notice how in that example we moved a single byte, and we used the syntax byte ptr. You can change that in word, dword or qword if you want to move a different amount of bytes.

The interactive example below allows you to experiment with all possible variations of the pointer syntax. You can click “run” to see how the memory is affected

code

mov rbx, 0x4242424242424242
mov rax, 0x20
mov  ptr [rax], bl

memory

00000000
00000001
00000002
00000003
00000004
00000005
00000006
00000007
00000008
00000009
0000000a
0000000b
0000000c
0000000d
0000000e
0000000f
00000010
00000011
00000012
00000013
00000014
00000015
00000016
00000017
00000018
00000019
0000001a
0000001b
0000001c
0000001d
0000001e
0000001f
00000020
00000021
00000022
00000023
00000024
00000025
00000026
00000027
00000028
00000029
0000002a
0000002b
0000002c
0000002d
0000002e
0000002f
00000030
00000031
00000032
00000033
00000034
00000035
00000036
00000037
00000038
00000039
0000003a
0000003b
0000003c
0000003d
0000003e
0000003f
00000040
00000041
00000042
00000043
00000044
00000045
00000046
00000047
00000048
00000049
0000004a
0000004b
0000004c
0000004d
0000004e
0000004f
00000050
00000051
00000052
00000053
00000054
00000055
00000056
00000057
00000058
00000059
0000005a
0000005b
0000005c
0000005d
0000005e
0000005f
00000060
00000061
00000062
00000063
00000064
00000065
00000066
00000067
00000068
00000069
0000006a
0000006b
0000006c
0000006d
0000006e
0000006f
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
................................................................................................................

A sidenote on endianness

We managed to reach this point by ignoring an important fact: x86-64 is a little endian architecture, which means that numbers are not stored in the way you would expect.
In the previous example, you saw what the number 0x4242424242424242 looks like in memory, but we choose that number carefully to hide the issue. In the next example, you can enter the number you want.
Can you spot what’s happening?

code

mov rbx,  
mov rax, 0x20
mov qword ptr [rax], rbx

memory

00000000
00000001
00000002
00000003
00000004
00000005
00000006
00000007
00000008
00000009
0000000a
0000000b
0000000c
0000000d
0000000e
0000000f
00000010
00000011
00000012
00000013
00000014
00000015
00000016
00000017
00000018
00000019
0000001a
0000001b
0000001c
0000001d
0000001e
0000001f
00000020
00000021
00000022
00000023
00000024
00000025
00000026
00000027
00000028
00000029
0000002a
0000002b
0000002c
0000002d
0000002e
0000002f
00000030
00000031
00000032
00000033
00000034
00000035
00000036
00000037
00000038
00000039
0000003a
0000003b
0000003c
0000003d
0000003e
0000003f
00000040
00000041
00000042
00000043
00000044
00000045
00000046
00000047
00000048
00000049
0000004a
0000004b
0000004c
0000004d
0000004e
0000004f
00000050
00000051
00000052
00000053
00000054
00000055
00000056
00000057
00000058
00000059
0000005a
0000005b
0000005c
0000005d
0000005e
0000005f
00000060
00000061
00000062
00000063
00000064
00000065
00000066
00000067
00000068
00000069
0000006a
0000006b
0000006c
0000006d
0000006e
0000006f
4578616d706c65207465787400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Example text....................................................................................................

In case you missed it, numbers are being saved with their bytes in an inverted order: For example, the number 0xcafe is composed of the byte ca followed by fe, but it will be saved as the byte fe followed by the byte ca.

What’s going on here is that both humans and computers use a positional number system to represent integers, but with a different order.
When we (humans using Hindu-Arabic numerals) represent numbers, we write the most significant value first, and continue in descending order. This is the same as Big endian architectures.

  human-readable decimal number
  1337 
  |  |
  |  Least significant digit 
  Most significant digit 

  human-readable hex number
  0xcafebabe  
    |     |
    |     Least significant byte
    Most significant byte

Little endian architectures write the least significant value first instead, and continue in ascending order.

This topic is explained in depth on wikipedia, with some useful diagrams that will solve any doubts you might have.
Endianness is only related to the way the processor handles integers. Other kinds of data, such as text, are usually encoded in the same order as you would expect. Floating point numbers are stored in a completely different format instead, you can read more about them in this great article , or in this visual guide by Ciechanowski

The stack

x64, like most architectures, has the concept of stack: an area in memory pointed by the special register rsp.
You can add or remove elements from the top of the stack by using the push and pop instructions. This is the most common interaction, but it’s also valid to directly adjust the value of rsp. In this interactive example the stack area is highlighted in blue, together with the value of the rsp and rax registers.

code

mov rax, 0x4242424242424242
 rax

registers

rsp  : [50 ff ff 7f 00 00 00 00]0x7fffff50 
rax  : [00 00 00 00 00 00 00 00]0x00 

memory

7fffff00
7fffff01
7fffff02
7fffff03
7fffff04
7fffff05
7fffff06
7fffff07
7fffff08
7fffff09
7fffff0a
7fffff0b
7fffff0c
7fffff0d
7fffff0e
7fffff0f
7fffff10
7fffff11
7fffff12
7fffff13
7fffff14
7fffff15
7fffff16
7fffff17
7fffff18
7fffff19
7fffff1a
7fffff1b
7fffff1c
7fffff1d
7fffff1e
7fffff1f
7fffff20
7fffff21
7fffff22
7fffff23
7fffff24
7fffff25
7fffff26
7fffff27
7fffff28
7fffff29
7fffff2a
7fffff2b
7fffff2c
7fffff2d
7fffff2e
7fffff2f
7fffff30
7fffff31
7fffff32
7fffff33
7fffff34
7fffff35
7fffff36
7fffff37
7fffff38
7fffff39
7fffff3a
7fffff3b
7fffff3c
7fffff3d
7fffff3e
7fffff3f
7fffff40
7fffff41
7fffff42
7fffff43
7fffff44
7fffff45
7fffff46
7fffff47
7fffff48
7fffff49
7fffff4a
7fffff4b
7fffff4c
7fffff4d
7fffff4e
7fffff4f
7fffff50
7fffff51
7fffff52
7fffff53
7fffff54
7fffff55
7fffff56
7fffff57
7fffff58
7fffff59
7fffff5a
7fffff5b
7fffff5c
7fffff5d
7fffff5e
7fffff5f
7fffff60
7fffff61
7fffff62
7fffff63
7fffff64
7fffff65
7fffff66
7fffff67
7fffff68
7fffff69
7fffff6a
7fffff6b
7fffff6c
7fffff6d
7fffff6e
7fffff6f
7fffff70
7fffff71
7fffff72
7fffff73
7fffff74
7fffff75
7fffff76
7fffff77
7fffff78
7fffff79
7fffff7a
7fffff7b
7fffff7c
7fffff7d
7fffff7e
7fffff7f
7fffff80
7fffff81
7fffff82
7fffff83
7fffff84
7fffff85
7fffff86
7fffff87
7fffff88
7fffff89
7fffff8a
7fffff8b
7fffff8c
7fffff8d
7fffff8e
7fffff8f
7fffff90
7fffff91
7fffff92
7fffff93
7fffff94
7fffff95
7fffff96
7fffff97
7fffff98
7fffff99
7fffff9a
7fffff9b
7fffff9c
7fffff9d
7fffff9e
7fffff9f
7fffffa0
7fffffa1
7fffffa2
7fffffa3
7fffffa4
7fffffa5
7fffffa6
7fffffa7
7fffffa8
7fffffa9
7fffffaa
7fffffab
7fffffac
7fffffad
7fffffae
7fffffaf
7fffffb0
7fffffb1
7fffffb2
7fffffb3
7fffffb4
7fffffb5
7fffffb6
7fffffb7
7fffffb8
7fffffb9
7fffffba
7fffffbb
7fffffbc
7fffffbd
7fffffbe
7fffffbf
7fffffc0
7fffffc1
7fffffc2
7fffffc3
7fffffc4
7fffffc5
7fffffc6
7fffffc7
7fffffc8
7fffffc9
7fffffca
7fffffcb
7fffffcc
7fffffcd
7fffffce
7fffffcf
7fffffd0
7fffffd1
7fffffd2
7fffffd3
7fffffd4
7fffffd5
7fffffd6
7fffffd7
7fffffd8
7fffffd9
7fffffda
7fffffdb
7fffffdc
7fffffdd
7fffffde
7fffffdf
54686973206973206578616d706c65206461746100000000e95155555555000040dcffff0100000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000
This is example data.....QUUUU..@.......X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

There are two key elements you should notice by plaing with the example above:

  • rsp points to the top of the stack. It is decreased by 8 when we push a value, and increased by 8 when we pop a value.
  • Every time we pop a value from the stack that value is not deleted, the area of memory that contains it simply stops being part of the stack. The only thing that changes is the memory address pointed by rsp.

Basically, push rax does the same as the following code:

sub rsp, 8
mov qword ptr [rsp], rax

And pop rax does the same as the following code

mov rax, qword ptr [rsp]
add rsp, 8

There is a confusing element here: when we put something onto the stack we are growing the stack, and yet we are moving towards lower addresses of memory.

With the way we visualize memory this actually looks correct, the stack is growing towards the top.
But if we only look at the numeric adresses of elements on the stack, newer elements have smaller addresses, which looks backwards.
Even when you are aware of this, it’s common to get confused and end up thinking: “i put a new value on the stack, but it has a smaller address than the previous value, what is going on?”

Memory alignment

I don’t think memory alignment can be explained in a better way than what this article does, so check it out. Here we’ll only focus on how memory alignment impacts the way we visualize the stack:
Every time you push or pop something from the stack, you move the stack pointer 8 bytes up or down. If you observe carefully the previous example, you’ll also notice that the addresses in the stack pointer are always multiples of 8: they always end with either 0 or 8.

This kind of alignment is done on purpose for performance reasons, and you will encounter it everywhere. As a consequence, when we visualize memory in a hexdump it’s common to start from addresses multiples of 8 or 16, so that data will fit properly in a row.

This is a hexdump taken from the stack memory of a function. Two different variables are highlighted: one is the 32-bit integer 0xcafebabe, the other is a stack canary, which we’ll see in another article. You can adjust the slider to change the start address in the hexdump.

showing memory from address 0x0

00000000
00000001
00000002
00000003
00000004
00000005
00000006
00000007
00000008
00000009
0000000a
0000000b
0000000c
0000000d
0000000e
0000000f
00000010
00000011
00000012
00000013
00000014
00000015
00000016
00000017
00000018
00000019
0000001a
0000001b
0000001c
0000001d
0000001e
0000001f
00000020
00000021
00000022
00000023
00000024
00000025
00000026
00000027
00000028
00000029
0000002a
0000002b
0000002c
0000002d
0000002e
0000002f
00000030
00000031
00000032
00000033
00000034
00000035
00000036
00000037
00000038
00000039
0000003a
0000003b
0000003c
0000003d
0000003e
0000003f
00000040
00000041
00000042
00000043
00000044
00000045
00000046
00000047
00000048
00000049
0000004a
0000004b
0000004c
0000004d
0000004e
0000004f
00000050
00000051
00000052
00000053
00000054
00000055
00000056
00000057
00000058
00000059
0000005a
0000005b
0000005c
0000005d
0000005e
0000005f
00000060
00000061
00000062
00000063
00000064
00000065
00000066
00000067
00000068
00000069
0000006a
0000006b
0000006c
0000006d
0000006e
0000006f
00000070
00000071
00000072
00000073
00000074
00000075
00000076
00000077
00000078
00000079
0000007a
0000007b
0000007c
0000007d
0000007e
0000007f
00000080
00000081
00000082
00000083
00000084
00000085
00000086
00000087
00000088
00000089
0000008a
0000008b
0000008c
0000008d
0000008e
0000008f
00000090
00000091
00000092
00000093
00000094
00000095
00000096
00000097
00000098
00000099
0000009a
0000009b
0000009c
0000009d
0000009e
0000009f
000000a0
000000a1
000000a2
000000a3
000000a4
000000a5
000000a6
000000a7
000000a8
000000a9
000000aa
000000ab
000000ac
000000ad
000000ae
000000af
000000b0
000000b1
000000b2
000000b3
000000b4
000000b5
000000b6
000000b7
000000b8
000000b9
000000ba
000000bb
000000bc
000000bd
000000be
000000bf
000000c0
000000c1
000000c2
000000c3
000000c4
000000c5
000000c6
000000c7
000000c8
000000c9
000000ca
000000cb
000000cc
000000cd
000000ce
000000cf
000000d0
000000d1
000000d2
000000d3
000000d4
000000d5
000000d6
000000d7
000000d8
000000d9
000000da
000000db
000000dc
000000dd
000000de
000000df
6578616d706c652061736369692074657874000000000000e951555555550000bebafeca0000000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000
example ascii text.......QUUUU..........X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

What I’m trying to show here is that everything is relative. What you see is always an abstract representation of the actual data, and it’s up to you to visualize it in a way that matches your mental model.

Further Reading

This article is still under development, and it’s improving over time.
If you reached this point, you might be interested in the next articles:

Additional resources: