Written on Feb 18, 2024

moving data - a visual guide to x86-64 assemly

It’s often said that assembly language is complex. After all, there’s a reason why high-level languages and compilers exist—to spare us the intricacies, right?
But while it’s true that you would have a hard time writing a large project in assembly, the language itself is remarcably simple. That’s because Assembly is the language of the processor, and at it’s essence, all the processor does is moving data.

This guide is not about writing assembly; it’s about understanding the way memory moves behind the scenes when you execute a program. We’ll use concrete examples for the x86-64 architecture, but these informations apply eveywhere and are foundamental knowledge for reverse engineering, binary exploitation, or simply program debugging

what is data?

Data is just bits, representing information. A sequence of bits can encode any kind of information, however this article will only focus on text and integers.

But before we talk about any kind of encoding, we have to introduce a new notation: The issue is that while circuits understand sequences of bits very well, humans don’t. For example, can you tell the difference between 1101010101111110 and 1101010101111110 ?

Show answer

Ok, the two sequences are identical, but I bet you couldn’t immediately see that.

In order to visualize binary data in a more human friendly way, we use hexadecimal numbers, which associate a number or a letter between A and F to a group of 4 bits.
A long sequence of bits can be represented in this way:

0010 0101 0111 1101 1111

 2    5    7    d    f

Note that in order to avoid confusion with decimal numbers, it’s common to prefix hexadecimal numbers with 0x. For example, 0x1234 is not the same thing as the decimal number 1234.
I’m not going to explain how conversions between decimal, binary, and hexadecimal numbers work, The only assumption i’m making in this article is that you know that.
If you have a python terminal, you can perform these conversions very easily:

al@thinkpad:~/$ python
>>>
>>> 0b0010 #print the binary number 0010 in decimal
2
>>> 0x1234 #print the hex number 1234 in decimal
4660
>>> hex(0b00100101011111011111)
'0x257df'
>>> hex(4660)
'0x1234'

One last convention you should know is that a group of bits has a name, based on how long it is:

N. of bits	example hex value	name
4	f	nibble
8	ff	byte
16	ffff	word
32	fffffff	dword (double word)
64	fffffffffffff	qword (quadruple word)

text

You probably know that there are different ways to encode text. The simplest encoding is ascii, where text is stored as a sequence of bytes and every byte represent a letter.
For example, the letter ‘c’ is stored as the byte 0x63, The letter ‘o’ is 0x6f, The text ciao is stored as the sequence of bytes 63 69 61 6f. You can find a table of all ascii letters in the linux man pages.

Text is a very good example of how data is usually encoded in the same order as you normally write it. For example, ‘ciao’ is stored as the byte for c, followed by the byte for i, and so on. You will see that this is not the case with numbers.

numbers

Integers in the x86-64 architecture are stored in little endian format.

Let’s take a decimal number, for example 3405691582: Converted to binary, that number is 0xcafebabe.
Unlike what we saw with text, those bytes will not be stored in order. Instead, little endian architectures will take a number, split it by bytes, and store those bytes in a reversed order.

0xcafebabe

will be store as:

be ba fe ca

For example, let’s say that you encounter the byte sequence 02 ff 00 00 00 00 00 00, what decimal number is it?

Show answer

we need to take all the bytes, in reverse order. that’s 0x000000000000ff02
exaclty like in decimal, leading zeros are meaningles, so we can remove them. we get 0xff02
using python, we can convert it to decimal. we get: 65282

where is data?

Now that we know how to represent text and numbers, we need some place to store them. Like all kind of data, we can store it in only two places:

in memory, which means in your RAM
in registers, which are special containers inside your CPU

memory

Memory is just a very long list of contiguous cells, each containing 8 bits of information, and reachable by a numeric address.

Since printing a long list of bytes would take a lot of space, when visualizing memory we usually group bytes in rows of 8 or 16. It’s also common to include a column to the side that shows the ascii letter associated to each byte.

The memory dump below was taken from a program that was running on my computer. Use the slider to adjust the number of bytes you wanto to show in a row.

showing 1 byte per row

00000000

00000001

00000002

00000003

00000004

00000005

00000006

00000007

00000008

00000009

0000000a

0000000b

0000000c

0000000d

0000000e

0000000f

00000010

00000011

00000012

00000013

00000014

00000015

00000016

00000017

00000018

00000019

0000001a

0000001b

0000001c

0000001d

0000001e

0000001f

00000020

00000021

00000022

00000023

00000024

00000025

00000026

00000027

00000028

00000029

0000002a

0000002b

0000002c

0000002d

0000002e

0000002f

00000030

00000031

00000032

00000033

00000034

00000035

00000036

00000037

00000038

00000039

0000003a

0000003b

0000003c

0000003d

0000003e

0000003f

00000040

00000041

00000042

00000043

00000044

00000045

00000046

00000047

00000048

00000049

0000004a

0000004b

0000004c

0000004d

0000004e

0000004f

00000050

00000051

00000052

00000053

00000054

00000055

00000056

00000057

00000058

00000059

0000005a

0000005b

0000005c

0000005d

0000005e

0000005f

00000060

00000061

00000062

00000063

00000064

00000065

00000066

00000067

00000068

00000069

0000006a

0000006b

0000006c

0000006d

0000006e

0000006f

00000070

00000071

00000072

00000073

00000074

00000075

00000076

00000077

00000078

00000079

0000007a

0000007b

0000007c

0000007d

0000007e

0000007f

00000080

00000081

00000082

00000083

00000084

00000085

00000086

00000087

00000088

00000089

0000008a

0000008b

0000008c

0000008d

0000008e

0000008f

00000090

00000091

00000092

00000093

00000094

00000095

00000096

00000097

00000098

00000099

0000009a

0000009b

0000009c

0000009d

0000009e

0000009f

000000a0

000000a1

000000a2

000000a3

000000a4

000000a5

000000a6

000000a7

000000a8

000000a9

000000aa

000000ab

000000ac

000000ad

000000ae

000000af

000000b0

000000b1

000000b2

000000b3

000000b4

000000b5

000000b6

000000b7

000000b8

000000b9

000000ba

000000bb

000000bc

000000bd

000000be

000000bf

000000c0

000000c1

000000c2

000000c3

000000c4

000000c5

000000c6

000000c7

000000c8

000000c9

000000ca

000000cb

000000cc

000000cd

000000ce

000000cf

000000d0

000000d1

000000d2

000000d3

000000d4

000000d5

000000d6

000000d7

000000d8

000000d9

000000da

000000db

000000dc

000000dd

000000de

000000df

50616765206e6f7420666f756e647f000000000000000000e95155555555000040dcffff0100000058dcffffff7f00000000000000000000e804be1278e96fe058dcffffff7f0000e951555555550000987d55555555000040d0fff7ff7f0000e8041ca48716901fe8043428fd06901f00000000ff7f0000000000000000000000000000000000000000000000000000000000000000000000429e875dca2f7e0000000000000000409ec2f7ff7f000068dcffffff7f0000987d555555550000e0e2fff7ff7f0000000000000000000000000000000000000051555555550000

Page not found...........QUUUU..@.......X...................x.o.X........QUUUU...}UUUU..@.................4(.............................................B..]./~........@.......h........}UUUU...........................QUUUU..

registers

Registers are containers for data, located inside your CPU. The x86-64 architecture has a lot of registers, each with an associated name. Some of them have a specific purpose, other are generic containers we can use in our program. We mostly interact with these one:

07 06 05 04
03 02
01 00

rax

eax

ah al

rbx

ebx

bh bl

rcx

ecx

ch cl

rdx

edx

dh dl

rsi

esi

-- sil

rdi

edi

-- dil

rsp

esp

-- spl

rbp

ebp

-- bpl

As you can see from the table above, registers that start with the prefix r can store 8 bytes of data. There are registers that give you access to the lower bytes of larger registers. for example eax gives you access to the lower 4 bytes of rax.

moving data

The instruction mov moves data around. It can move data from a register to another, from a register to memory, or vice-versa from memory to a register

These first examples are self-explanatory:

mov rbx, 0x10  #copies an integer into rbx
mov rax, rbx   #copies the content of rbx into rax

Moving data to memory requires some extra syntax:
Let’s say that our goal is to write the byte 0xff in the memory cell at address 0x10.

First, we put in a register 0x10, the address to the cell we want to write.
Then we perform a mov instruction with square brackets around the register name, to indicate that we want to move 0xff in the memory address pointed by the register, and not into the register itself.

mov rax, 0x10
mov byte ptr [rax], 0xff

What if we wanted to write to memory more than one byte, for example the whole 8 bytes contained in rbx? This requires a variation of the ptr syntax, to specify that we want to write a sequence of 8 bytes to memory.

The interactive example below shows all the possible variations of the ptr syntax. You can click “run” to see how the memory is affected

	mov rbx, 0x4242424242424242
	mov rax, 0x20
	mov  ptr [rax], bl   #note: bl is the first byte of register rbx

00000000

00000001

00000002

00000003

00000004

00000005

00000006

00000007

00000008

00000009

0000000a

0000000b

0000000c

0000000d

0000000e

0000000f

00000010

00000011

00000012

00000013

00000014

00000015

00000016

00000017

00000018

00000019

0000001a

0000001b

0000001c

0000001d

0000001e

0000001f

00000020

00000021

00000022

00000023

00000024

00000025

00000026

00000027

00000028

00000029

0000002a

0000002b

0000002c

0000002d

0000002e

0000002f

00000030

00000031

00000032

00000033

00000034

00000035

00000036

00000037

00000038

00000039

0000003a

0000003b

0000003c

0000003d

0000003e

0000003f

00000040

00000041

00000042

00000043

00000044

00000045

00000046

00000047

00000048

00000049

0000004a

0000004b

0000004c

0000004d

0000004e

0000004f

00000050

00000051

00000052

00000053

00000054

00000055

00000056

00000057

00000058

00000059

0000005a

0000005b

0000005c

0000005d

0000005e

0000005f

00000060

00000061

00000062

00000063

00000064

00000065

00000066

00000067

00000068

00000069

0000006a

0000006b

0000006c

0000006d

0000006e

0000006f

the stack

x64, like most architectures, has the concept of stack: an area in memory pointed by the special register rsp.
You can add or remove elements from the top of the stack by using the push and pop instructions. This is the most common interaction, but it’s also valid to directly adjust the value of rsp. This interactive example allows you to experiment with push and pop. The stack area is highlighted in blue and displayed below, togehther with the contents of the rsp and rax registers.

	mov rax, 0x4242424242424242
	 rax

rsp 50 ff ff 7f 00 00 00 00 (0x7fffff50)

rax 00 00 00 00 00 00 00 00 (0x00)

7fffff00

7fffff01

7fffff02

7fffff03

7fffff04

7fffff05

7fffff06

7fffff07

7fffff08

7fffff09

7fffff0a

7fffff0b

7fffff0c

7fffff0d

7fffff0e

7fffff0f

7fffff10

7fffff11

7fffff12

7fffff13

7fffff14

7fffff15

7fffff16

7fffff17

7fffff18

7fffff19

7fffff1a

7fffff1b

7fffff1c

7fffff1d

7fffff1e

7fffff1f

7fffff20

7fffff21

7fffff22

7fffff23

7fffff24

7fffff25

7fffff26

7fffff27

7fffff28

7fffff29

7fffff2a

7fffff2b

7fffff2c

7fffff2d

7fffff2e

7fffff2f

7fffff30

7fffff31

7fffff32

7fffff33

7fffff34

7fffff35

7fffff36

7fffff37

7fffff38

7fffff39

7fffff3a

7fffff3b

7fffff3c

7fffff3d

7fffff3e

7fffff3f

7fffff40

7fffff41

7fffff42

7fffff43

7fffff44

7fffff45

7fffff46

7fffff47

7fffff48

7fffff49

7fffff4a

7fffff4b

7fffff4c

7fffff4d

7fffff4e

7fffff4f

7fffff50

7fffff51

7fffff52

7fffff53

7fffff54

7fffff55

7fffff56

7fffff57

7fffff58

7fffff59

7fffff5a

7fffff5b

7fffff5c

7fffff5d

7fffff5e

7fffff5f

7fffff60

7fffff61

7fffff62

7fffff63

7fffff64

7fffff65

7fffff66

7fffff67

7fffff68

7fffff69

7fffff6a

7fffff6b

7fffff6c

7fffff6d

7fffff6e

7fffff6f

7fffff70

7fffff71

7fffff72

7fffff73

7fffff74

7fffff75

7fffff76

7fffff77

7fffff78

7fffff79

7fffff7a

7fffff7b

7fffff7c

7fffff7d

7fffff7e

7fffff7f

7fffff80

7fffff81

7fffff82

7fffff83

7fffff84

7fffff85

7fffff86

7fffff87

7fffff88

7fffff89

7fffff8a

7fffff8b

7fffff8c

7fffff8d

7fffff8e

7fffff8f

7fffff90

7fffff91

7fffff92

7fffff93

7fffff94

7fffff95

7fffff96

7fffff97

7fffff98

7fffff99

7fffff9a

7fffff9b

7fffff9c

7fffff9d

7fffff9e

7fffff9f

7fffffa0

7fffffa1

7fffffa2

7fffffa3

7fffffa4

7fffffa5

7fffffa6

7fffffa7

7fffffa8

7fffffa9

7fffffaa

7fffffab

7fffffac

7fffffad

7fffffae

7fffffaf

7fffffb0

7fffffb1

7fffffb2

7fffffb3

7fffffb4

7fffffb5

7fffffb6

7fffffb7

7fffffb8

7fffffb9

7fffffba

7fffffbb

7fffffbc

7fffffbd

7fffffbe

7fffffbf

7fffffc0

7fffffc1

7fffffc2

7fffffc3

7fffffc4

7fffffc5

7fffffc6

7fffffc7

7fffffc8

7fffffc9

7fffffca

7fffffcb

7fffffcc

7fffffcd

7fffffce

7fffffcf

7fffffd0

7fffffd1

7fffffd2

7fffffd3

7fffffd4

7fffffd5

7fffffd6

7fffffd7

7fffffd8

7fffffd9

7fffffda

7fffffdb

7fffffdc

7fffffdd

7fffffde

7fffffdf

There are two key elements you should notice by plaing with the example above:

rsp points to the top of the stack. It is decreased by 8 when we push a value, and increased by 8 when we pop a value.
Every time we pop a value from the stack that value is not deleted, the area of memory that contains it simply stops being part of the stack. The only thing that changes is the memory address pointed by rsp.

Basically, push rax does the same as the following code:

sub rsp, 8
mov qword ptr [rsp], rax

And pop rax does the same as the following code

mov rax, qword ptr [rsp]
add rsp, 8

There is a confusing element here: when we put something onto the stack we are growing the stack, and yet we are moving towards lower addresses of memory.

With the way we visualize memory this actually looks correct, the stack is growing towards the top.
But if we only look at the numeric adresses of elements on the stack newer elements have smaller addresses, which looks backwards.
Even when you are aware of this, it’s common to get confused and end up thinking: “i put a new value on the stack, and it has a smaller address than the previous value, how it is possible?”