class: title, smokescreen, shelf, bottom, no-footer background-image: url(images/vnarch.svg) # 181U Spring 2020 ### Processor: Instruction Execution --- layout: true .footer[ - Geoffrey Brown, 2020 - 181U ] <style> h1 { border-bottom: 8px solid rgb(32,67,143); border-radius: 2px; width: 90%; } .smokescreen h1 { border-bottom: none; } .small.remark-slide-content.compact {font-size:1.2rem} .smaller.remark-slide-content.compact {font-size:1.1rem} .small-code.remark-slide-content.compact code {font-size:1.0rem} .very-small-code.remark-slide-content.compact code {font-size:0.9rem} .line-numbers{ /* Set "line-numbers-counter" to 0 */ counter-reset: line-numbers-counter; } .line-numbers .remark-code-line::before { /* Increment "line-numbers-counter" by 1 */ counter-increment: line-numbers-counter; content: counter(line-numbers-counter); text-align: right; width: 20px; border-right: 1px solid #aaa; display: inline-block; margin-right: 10px; padding: 0 5px; } </style> --- class: compact # Agenda * Von Neumann Architecture * Cortex Instruction Interpreter * C Execution Model * Threads --- class: compact # Von Neumann Architecture ![](images/vnarch.svg# w-40pct fr) ```C extern inst_t M[...]; // Memory extern size_t pc; // Program Counter while(1) { inst_t inst = M[pc++]; interpret(inst); } ``` --- class: compact # Cortex-M Architecture ![](images/2019-12-26-09-07-52.png# w-5-12th) ![](images/2019-12-26-09-14-12.png# w-50pct) --- class: compact # An Example Instruction ![](images/space.png# w-10pct) ![](images/2019-12-26-09-20-18.png# w-80pct) --- class: compact # Instruction Types * Data Processing * `add`, `sub`, `and`, `or`, ... * Control Flow: `if cond goto label` * `beq label` : "branch if equal" * `bgt label` : "branch if greater than" * `bxx label` : other conditions * Control Flow: Procedure call * `bl label` : "branch and link" `pc, lr = label, pc+2` * `blx` : "return" `pc = lr` * Memory Reference * `ldr rx, [ry]` * `str rx, [ry]` * `push {r0, r1}` --- class: compact # Memory Reference Instructions For a C programmer, it's easiest to think of memory reference instructions as pointer operations. ```C ldr r0, [r1] @ r0 = *((uint32_t *) r1) str r0, [r1] @ *((uint32_t *) r1) = r0 ``` Typically, there are specialized memory reference instructions for * bytes (signed, unsigned) * halfs (signed, unsigned) ARM processors also have stack `push` and `pop` operations to move groups of registers onto *the* stack. --- class: compact # Control Flow Operations * Control flow operations typically compute the *condition* and then perform the conditional goto. * The *condition* is a set of bits about a subtraction operation -- is the result (N)egative, (Z)ero, ```C if (r0) goto label1 ... label1: ... ``` ```C cmp r0, 0 @ r0-0 bne label1 @ goto label1 if the result is not zero ... label1: ... ``` --- class: compact # Condition Flags ![](images/2019-12-26-09-34-53.png# w-70pct) * N Negative -- result of previous computation was negative * Z Zero -- result of previous computation was zero * C Carry -- result of previous add/sub generated a carry * V Overflow -- result of previous add/sub was too large This execution model, using a set of condition flags, is a performance bottleneck for high performance out-of-order execution processors, but it is a *very* common instruction set model. --- class: compact # Instruction Encoding Example Cortex-M Data Processing Instructions have a few formats; this is the most important one. ![](images/space.png# w-10pct) ![](images/2019-12-26-09-41-51.png# w-80pct) --- class: compact # Cortex-M Instruction Sets ![](images/space.png# w-20pct) ![](images/cortexM.png# w-60pct) --- class: compact # C Memory Model Redux ![](images/c-memory.png# w-60pct fr) * The stack is used for temporary storage * Local variables * "Spilling" registers * Parameter passing * The heap is used for dynamic allocation of long-lived variables * The C interface is malloc/free * We won't use the heap -- many embedded applications don't --- class: compact,hljs-tomorrow-night-eighties,line-numbers # C Stack Use (Example) Consider a simple recursive procedure that counts the 1 bits in a word. ```C int ones(unsigned int i) { if (i) { return (i & 1) + ones(i>>1) } else { return 0; } } ``` * The parameter `i` is passed in a register (r0), so it needs to be preserved on recursive call; to do this we "spill" `r4` to the stack. * The link register `lr` must be preserve by saving on the stack --- class: compact # Preserving r4 and lr on the stack The `ones` implementation must preserve `r4` and `lr` on the stack because both are modified. ![](images/space.png# w-30pct) ![](images/2019-12-26-14-39-09.png# w-40pct) --- class: compact,hljs-tomorrow-night-eighties,line-numbers # Push In the cortex processors, the `push` instruction is used to save registers to the stack `push {r4, lr} @ write r4 and lr to the stack` ![](images/space.png# w-3-12th) ![](images/2019-12-26-14-41-37.png# w-50pct) The "dual" of `push` is `pop` --- `pop {r4, pc}` --- class: compact,hljs-tomorrow-night-eighties,line-numbers # Skeleton of ones procedure ```asm ones: push {r4, lr} @ save r4, lr on stack mov r4, r0 @ move i to r4 ... bl ones ... pop {r4,pc} @ restore r4 and return from call ``` --- class: compact # Roles of the Registers * The first four registers `r0-r3` are used to pass arguments into a subroutine and return a result from a function. * Register r12 is used by the linker * Registers `r4-r8`, `r9`, and `r10-r11` may be used for local values; a subroutine (*callee*) must preserver the contents of `r4-r11` Thus, `r0-r3` and `r12` are *caller saved* registers, all the other registers below `r13` are *callee saved*. `r13-r15` are the `sp`, `lr`, and `pc`, respectively. --- class: compact # Stack Layout ![](images/space.png# w-2-12th) ![](images/2019-12-26-14-45-24.png# w-70pct) --- class: compact,small-code,hljs-tomorrow-night-eighties,line-numbers # Stack Frame Example ```C extern int foo(int *); int alloc(int i) { int local[15]; return i + foo(local); } ``` ```asm alloc: @ prolog push {r4, lr} @ preserve on stack sub sp, 64 @ allocate space for local on stack @ body movs r4, r0 @ move i to r4 add r0, sp, 4 @ compute pointer to local bl foo @ call foo adds r0, r4 @ compute result @ epilog add sp, 64 @ deallocate space for local pop {r4, pc} @ restore r4 and return ``` --- class: compact # Parameter Area of Stack Registers r0-r3 are used for parameters, after that the remaining parameter space is on the stack. ![](images/space.png# w-3-12th) ![](images/2019-12-26-14-54-54.png# w-50pct) --- class: compact,hljs-tomorrow-night-eighties,line-numbers # Parameter Passing Example ```C extern int foo(int, int, int, int); int call(int a, int b, int c, int d, int e) { return a + e + foo(a,b,c,d); } ``` ```asm call: push {r4, lr} @ save r4, lr ldr r4, [sp, #8] @ r4 = e adds r4, r0 @ r4 += a bl foo adds r0, r4 @ add foo return value pop {r4, pc} @ restore ``` --- class: compact # Threads ![](images/space.png# w-2-12th) ![](images/2019-12-26-10-31-41.png# w-70pct) https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/4_Threads.html --- class: compact,small-code,hljs-tomorrow-night-eighties,line-numbers # Threads in ChibiOS (teaser) ```C /* * Green LED blinker thread, times are in milliseconds. */ static THD_WORKING_AREA(waBlinker, 256); static THD_FUNCTION(Blinker, arg) { (void)arg; chRegSetThreadName("Blinker"); while (true) { palSetLine(LINE_GPIOA_LED_GREEN); // led on chThdSleepMilliseconds(500); // sleep 0.5 seconds palClearLine(LINE_GPIOA_LED_GREEN); // led off chThdSleepMilliseconds(500); } } ... int main(void){ ... chThdCreateStatic(waBlinker, sizeof(waBlinker), NORMALPRIO, Blinker, NULL); ... } ``` --- class: compact # Summary * von Neumann machine * Cortex architecture * Instruction processing * C execution model * Cover Image: Kapooht [<a href="https://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a>], <a href="https://commons.wikimedia.org/wiki/File:Von_Neumann_Architecture.svg">via Wikimedia Commons</a>