class: title, smokescreen, shelf, bottom, no-footer background-image: url(images/protobuf.png) # 181U Spring 2020 ## Message Serialization <style> h1 { border-bottom: 8px solid rgb(32,67,143); border-radius: 2px; width: 90%; } .smokescreen h1 { border-bottom: none; } .small {font-size: 80%} .smaller {font-size: 70%} .small-code.remark-slide-content.compact code {font-size:1.0rem} .very-small-code.remark-slide-content.compact code {font-size:0.9rem} .line-numbers{ /* Set "line-numbers-counter" to 0 */ counter-reset: line-numbers-counter; } .line-numbers .remark-code-line::before { /* Increment "line-numbers-counter" by 1 */ counter-increment: line-numbers-counter; content: counter(line-numbers-counter); text-align: right; width: 20px; border-right: 1px solid #aaa; display: inline-block; margin-right: 10px; padding: 0 5px; } </style> --- layout: true .footer[ - 181U - See acknowledgements ] --- class: compact # Agenda <audio controls> <source src="audio/serialization_2.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * The problem * XDR * JSON * Protocol Buffers --- class: compact,small # The Problem <audio controls> <source src="audio/serialization_3.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * MQTT provides a protocol for exchanging data **topic value** but no guidance about the syntax for "value" * Every receiver of an MQTT message must - decode the topic and route the message appropriately - decode the value, not so bad for simple things like 3.14159 or 42, but what if you want to send a structure ? --- class: compact,small # The Problem <audio controls> <source src="audio/serialization_4.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Every sender of an MQTT message must - encode the topic, encode the value * This problem arises in other context - Configuration of routers - remote procedure calls - Storing/retrieving binary data --- class: compact # Data Serialization <audio controls> <source src="audio/serialization_5.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ![](images/space.png# w-20pct) ![](images/serialize-deserialize.png# w-50pct) https://www.geeksforgeeks.org/serialization-in-java/ --- class: compact # Language Specific Solutions <audio controls> <source src="audio/serialization_6.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Java -- java.io.Serializability interface - `writeObject(Object obj)` - serialization runtime associates a version number with each Serializable class called a SerialVersionUID - Reader and Writer have to use the same code/version for the object library * Python -- pickle - `pickle` module implements binary protocols for serializing and de-serializing Python structure - not secure -- pickle data can execute arbitrary code --- class: compact # Language Independent Solutions <audio controls> <source src="audio/serialization_7.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * XDR (External Data Representation Standard) [rfc1832](https://tools.ietf.org/html/rfc1832) - Internet protocol developed for transfering data * JSON (Javascript Object Notation) - Human readable format * Google Protocol Buffers (protobuf) -- binary encoding, machine independent --- class: compact # XDR: External Data Representation <audio controls> <source src="audio/serialization_8.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Uses a language to describe data formats * Used by ONC RPC (remote procedure calls), and NFS (network file system) * assumes bytes are portable * standard allows encoding/decoding on different architectures * Format defined by an IDL file (a data description language) * IDL file compiled to c code with rpcgen --- class: compact # XDR Encoding <audio controls> <source src="audio/serialization_9.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * All items encoded as a multiple of four bytes. * XDR Data types - Integer (32-bits, big endian) - Unsigned integer - Enumerations - Long integer (64 bits) - Floating point - Strings - Structures - Unions --- class: compact,very-small-code,compact,hljs-tomorrow-night-eighties,line-numbers # XDR Encoding (example) <audio controls> <source src="audio/serialization_10.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Integer ```plaintext (MSB) (LSB) +-------+-------+-------+-------+ |byte 0 |byte 1 |byte 2 |byte 3 | +-------+-------+-------+-------+ <------------32 bits------------> ``` --- class: compact,very-small-code,compact,hljs-tomorrow-night-eighties,line-numbers # XDR Encoding (example) <audio controls> <source src="audio/serialization_11.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```c struct { component-declaration-A; component-declaration-B; ... } identifier; ``` The components of the structure are encoded in the order of their declaration in the structure. Each component's size is a multiple of four bytes, though the components may be different sizes. ```plaintext +-------------+-------------+... | component A | component B |... +-------------+-------------+... ``` --- class: compact,very-small-code,compact,hljs-tomorrow-night-eighties,line-numbers # XDR Encoding (string) <audio controls> <source src="audio/serialization_12.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Length/data format * Always rounded to multiple of four bytes ```plaintext 0 1 2 3 4 5 ... +-----+-----+-----+-----+-----+-----+...+-----+-----+...+-----+ | length n |byte0|byte1|...| n-1 | 0 |...| 0 | +-----+-----+-----+-----+-----+-----+...+-----+-----+...+-----+ |<-------4 bytes------->|<------n bytes------>|<---r bytes--->| |<----n+r (where (n+r) mod 4 = 0)---->| STRING ``` --- class: compact # XDR Observations <audio controls> <source src="audio/serialization_13.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * C-centric -- all of the types are C types * (Very) inefficient -- every type is allocated space for the "worst case" (arrays and strings are exceptions) * Very sensitive to change -- not possible to extend the message type (perhaps for other receivers) without recompiling the code everywhere * No real sanity checking for messages * Tools (rpcgen) are aimed at remote procedure calls * rpcgen creates - header file with data structure for message - encoder function - decoder function --- class: compact # JSON (Javascript object format) <audio controls> <source src="audio/serialization_14.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ![](images/json-object.png# w-40pct fr) * Human readable/writable data format * Subset of Javascript * Built on two structures * A collection of name/value pairs (e.g. a dictionary) * An ordered list of values (e.g. an array or list) --- class: compact # JSON array and value <audio controls> <source src="audio/serialization_15.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ![](images/json-value.png# w-40pct) ![](images/space.png# w-10pct) ![](images/json-array.png# w-40pct) --- class: very-small-code,compact,hljs-tomorrow-night-eighties,line-numbers,col-2 # JSON Example <audio controls> <source src="audio/serialization_16.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```javascript {"widget": { "debug": "on", "window": { "title": "Sample Konfabulator Widget", "name": "main_window", "width": 500, "height": 500 }, "image": { "src": "Images/Sun.png", "name": "sun1", "hOffset": 250, "vOffset": 250, "alignment": "center" }, ``` <br> ```javascript "text": { "data": "Click Here", "size": 36, "style": "bold", "name": "text1", "hOffset": 250, "vOffset": 100, "alignment": "center", "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;" } }} ``` --- class: compact # JSON Encoding/Decoding <audio controls> <source src="audio/serialization_17.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Relatively easy to "parse", but meaning is left as an exercise to the programmer - no error checking (is the JSON structure the correct one ?) - no specification of the "correct" JSON structure for validation * There are libraries to help, but there is still string to xxx interpretation - Example JSMN --- class: compact # JSMN <audio controls> <source src="audio/serialization_18.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * portable to embedded processors * C89 compatible output * No library dependencies * Small footprint * Just parses into tokens, other work left to user. * Use C libraries to parse numbers * Need to map objects to data structures --- class: compact,small-code,hljs-tomorrow-night-eighties,line-numbers,col-2 # JSMN Parsing <audio controls> <source src="audio/serialization_19.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```javascript '{ "name" : "Jack", "age" : 27 }' ``` JSMN creates tokens with boundaries in the string * Object [0..31] * String [3..7], String [12..16], String [20.23] * Number [27..29] `jsmntok_t` type is: ```C typedef struct { jsmntype_t type; /* Token type */ int start; /* Token start position */ int end; /* Token end position */ int size; /* Number of child (nested) tokens */ } jsmntok_t; ``` --- class: compact,small-code,hljs-tomorrow-night-eighties,line-numbers,col-2 # Protocol Buffers <audio controls> <source src="audio/serialization_20.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Google's mechanism for serializing structured data - language-neutral - platform-neutral - extensible ```c message Person { required string name = 1; required int32 id = 2; optional string email = 3; } ``` <br> ```C++ Person john; fstream input(argv[1], ios::in | ios::binary); john.ParseFromIstream(&input); id = john.id(); name = john.name(); email = john.email(); ``` --- class: compact,small-code,hljs-tomorrow-night-eighties,line-numbers # Protocol Buffer Basics: (e.g. Python) <audio controls> <source src="audio/serialization_21.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Message formats defined in a .proto file * Converted to target language with protocol buffer compiler * Example: Python protocol buffer API to write, read messages * Why use protcol buffers - very compact binary format - generate language specific API for the message format chosen - message formats can be extended without affecting existing applications --- class: compact,very-small-code,hljs-tomorrow-night-eighties,line-numbers,col-2 # Protocol Buffer Example <audio controls> <source src="audio/serialization_22.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```protobuf syntax = "proto2"; package tutorial; message Person { required string name = 1; required int32 id = 2; optional string email = 3; enum PhoneType { MOBILE = 0; HOME = 1; WORK = 2; } ``` ```protobuf message PhoneNumber { required string number = 1; optional PhoneType type = 2 [default = HOME]; } repeated PhoneNumber phones = 4; } message AddressBook { repeated Person people = 1; } ``` --- class: compact,very-small-code,hljs-tomorrow-night-eighties,line-numbers # Compiling .proto file <audio controls> <source src="audio/serialization_23.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ``` protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.proto ``` * This generates `addressbook_pb2.py` * Here is an example of creating a person ```python import addressbook_pb2 person = addressbook_pb2.Person() person.id = 1234 person.name = "John Doe" person.email = "jdoe@example.com" phone = person.phones.add() phone.number = "555-4321" phone.type = addressbook_pb2.Person.HOME ``` --- class: compact,very-small-code,hljs-tomorrow-night-eighties,line-numbers # Serialization Methods Generated <audio controls> <source src="audio/serialization_24.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * `SerializeToString()`: serializes the message and returns it as a string * `ParseFromString(data)`: parses a message for a string --- class: compact # C++ Compilation <audio controls> <source src="audio/serialization_25.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ``` protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto ``` * `addressbook.pb.h` : c++ header * `addressbook.pb.cc` : c++ implementation of classes --- class: compact,very-small-code,hljs-tomorrow-night-eighties,line-numbers,col-2 # C++ Generated API <audio controls> <source src="audio/serialization_26.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```c++ // name inline bool has_name() const; inline void clear_name(); inline const ::std::string& name() const; inline void set_name(const ::std::string& value); inline void set_name(const char* value); inline ::std::string* mutable_name(); // id inline bool has_id() const; inline void clear_id(); inline int32_t id() const; inline void set_id(int32_t value); ``` <br> ```c++ // email inline bool has_email() const; inline void clear_email(); inline const ::std::string& email() const; inline void set_email(const ::std::string& value); inline void set_email(const char* value); inline ::std::string* mutable_email(); // phones inline int phones_size() const; inline void clear_phones(); inline const ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >& phones() const; inline ::google::protobuf::RepeatedPtrField< ::tutorial::Person_PhoneNumber >* mutable_phones(); inline const ::tutorial::Person_PhoneNumber& phones(int index) const; inline ::tutorial::Person_PhoneNumber* mutable_phones(int index); inline ::tutorial::Person_PhoneNumber* add_phones(); ``` --- class: compact # C++ Parsing and Serialization <audio controls> <source src="audio/serialization_27.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * `bool SerializeToString(string* output) const;`: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string class as a convenient container. * `bool ParseFromString(const string& data);`: parses a message from the given string. * `bool SerializeToOstream(ostream* output) const;`: writes the message to the given C++ ostream. * `bool ParseFromIstream(istream* input);`: parses a message from the given C++ istream. --- class: compact,very-small-code,hljs-tomorrow-night-eighties,line-numbers # Encoding -- A simple message <audio controls> <source src="audio/serialization_28.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```c message Test1 { optional int32 a = 1; } ``` Suppose you create a message and set `a` to 150. The serialized stream is three bytes (smaller than an int) ``` 08 96 01 ``` --- class: compact,very-small-code,hljs-tomorrow-night-eighties,line-numbers # Encoding (example) <audio controls> <source src="audio/serialization_29.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Encoding of integers is variable length -- only the bytes needed are generated - Each byte except the last has msb set (thus carries 7 bits of data) - bits are least significant group first - To encode the number 1 `0000 0001` (a single byte) - To encode 300 : ```plaintext 1010 1100 0000 0010 --> drop msb 010 1100 000 0010 --> reverse order of "bytes" 000 0010 010 1100 --> simplify 1 0010 1100 --> convert to decimal 256 + 32 + 8 + 4 --> 300 ``` --- class: compact # Message Structure <audio controls> <source src="audio/serialization_30.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Protocol buffer message is a series of key-value pairs. * Keys in the binary message are the "tags" * Encoded keys are tag + "wire type" Wire types | Type | Meaning | Used for | | -----|---------|----------| | 0 | Varint | int32, int64, uint32, uint64, sint32, sint64, bool, enum | | 1 | 64-bit | fixed64, sfixed64, double | | 2 | length-delimited | string, bytes, ... | | 5 | 32-bit | fixed32,sfixed32, float | --- class: compact,small-code,hljs-tomorrow-night-eighties,line-numbers # Message -- String <audio controls> <source src="audio/serialization_31.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> ```c message Test2 { optional string b = 2; } ``` Suppose we have a message with b = "testing" ```plaintext 12 07 74 65 73 74 69 6e 67 ``` The last 7 bytes are the utf-8 encoding of "testing" --- class: compact # Protocol Buffers Summary (so far) <audio controls> <source src="audio/serialization_32.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Simple language for describing message types * Very compact "wire" encoding * Compiler generates message specific APIs for a variety of languages - C++ - Java - Python - Go - C# ... --- class: compact # Protocol Buffer Message Definitions Can be Extended <audio controls> <source src="audio/serialization_33.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * you *must not* change the tag numbers of existing fields * you *must not* delete any required fields * you *may* delete optional or repeated fields * you *may* add new optional or repeated fields provided fresh tag numbers are used --- class: compact # Protocol Buffer support for embedded code <audio controls> <source src="audio/serialization_34.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Nanopb -- an extension that uses protoc to generate compact C code * Typical project includes these files - Nanopb runtime - Protocol description - protodef.proto - protodef.pb.c (generated) - protodef.pb.h (geneated) * Small code size (~10KB compiled) * Small ram usage (around 300 bytes plus message structs) * I've used this for my research in sub-gram data loggers -- all communication with tags is using protobuf --- class: compact # Summary <audio controls> <source src="audio/serialization_35.mp3" type="audio/mpeg"> Your browser does not support the audio element. </audio> * Serialization - The Problem - Three approaches : XDR, JSON, Protocol Buffers * Acknowledgements - Cover: Alternative and Flexible Control Approaches for Robotic Manipulators: on the Challenge of Developing a Flexible Control Architecture that Allows for Controlling Different Manipulators - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/To-use-Protocol-Buffers-it-is-necessary-to-generate-code-for-each-message-that-needs_fig17_285578991 [accessed 6 Feb, 2020]