Binary Schema
Simply, a binary schema is the parsed textual schema in a condensed binary format that enables dynamic encoding and decoding of binary records at runtime.
Bebop already offers backwards and forwards compatibility for data. However, sometimes an application needs to be able to encode and decode data in a context where it doesn’t have access to the source schema. For example, observation and dev tooling that needs to be able to inspect ingested request and responses; the tooling itself does not have access to the source textual schema and thus cannot generate any code for records it receives.
Without the compiler emitted code an application cannot make any guarantees on type safety until it hits an integration (which will throw an error) and is something Bebop should avoid altogether.
Implementation
The compiler emits a static byte array into generated code that represent the textual schemas that created it; this data can be appended to request/responses and each runtime implements a method to parse that binary schema which can then be used to encode and decode records.
The encoded schema looks like:
When read, you end up with a new intemediary representation of the schema that can be used to encode and decode records:
Wire Format
-
Schema Version: 1-byte integer (
byte
). -
Definition Count: 4-byte integer (
uint32
).For each defined type:
-
Definition Name: UTF-8 encoded string. It’s null terminated.
-
Kind: 1-byte integer (
byte
). Where 1=Struct, 2=Message, 3=Union, 4=Enum. -
Decorators: see Decorators.
-
Definition: Depends on the kind.
-
-
Service Count: 1-byte integer (
byte
).For each service:
-
Service Name: UTF-8 encoded string. It’s null terminated.
-
Decorators: see Decorators.
-
Methods Count: 1-byte integer (
byte
).For each method:
-
Method Name: UTF-8 encoded string. It’s null terminated.
-
Decorators: see Decorators.
-
Method Type: 1-byte integer (
byte
). Where 0=Unary, 1=Server Streaming, 2=Client Streaming, 3=Duplex Streaming. -
Request Type ID: 4-byte integer (
int32
). -
Response Type ID: 4-byte integer (
int32
). -
Method ID: 4-byte integer (
uint32
).
-
-
Decorators
-
Decorators: It begins with the count as a 1-byte integer (
byte
). Each decorator is a sequence of:-
Name: a null terminated UTF-8 encoded string.
-
Argument Count: 1-byte integer (
byte
). -
Arguments: For each argument:
-
Name: a null terminated UTF-8 encoded string.
-
Type: 4-byte signed integer (
int32
). -
Value: Depends on the type.
-
-
Struct
A struct definition is encoded as follows:
-
Is Mutable: 1-byte boolean.
-
Minimal Encode Size: 4-byte integer (
int32
). -
Is Fixed Size: 1-byte boolean.
-
Fields Count: 1-byte integer (
byte
).For each field:
-
Name: a null terminated UTF-8 encoded string.
-
Type ID: 4-byte integer (
int32
). -
Decorators: see Decorators.
-
Message
-
Minimal Encode Size: 4-byte integer (
int32
). -
Fields Count: 1-byte integer (
byte
). For each field:-
Name: a null terminated UTF-8 encoded string.
-
Type ID: 4-byte integer (
int32
). -
Decorators: see Decorators.
-
Constant Value: 1-byte integer (
byte
). This is the index of the field.
-
Union
-
Minimal Encode Size: 4-byte integer (
int32
). -
Branch Count: 1-byte integer (
byte
).For each branch:
-
Discriminator: 1-byte integer (
byte
). -
Type ID: 4-byte integer (
int32
).
-
Enum
-
Base Type: 4-byte integer (
int32
). -
Is Bit Flags: 1-byte boolean.
-
Minimal Encode Size: 4-byte integer (
int32
). -
Member Count: 1-byte integer (
byte
).For each member:
-
Name: a null terminated UTF-8 encoded string.
-
Decorators: see Decorators.
-
Value: depends on the base type.
-
Type ID Encoding
Type IDs are encoded as follows:
- Defined Type: The index of the type definition in the defined types list.
- Scalar Type: The bitwise negation of the index of the BaseType in the list of base types. This effectively results in:
Bool = -1
Byte = -2
UInt16 = -3
Int16 = -4
UInt32 = -5
Int32 = -6
UInt64 = -7
Int64 = -8
Float32 = -9
Float64 = -10
String = -11
Guid = -12
Date = -13
- Array Type:
-14
- Map Type:
-15
Type IDs correspond to the type of a field, a member of an enum, the type of a branch in a union, or the return.
Usage
Tradeoffs
Type inference is lost. The objects you work with will be dynamic; mistakes can be caught on the client-side still via proxies or other means, but it is up to the user to know the type. This is fine as this method is only used in extreme edge cases.