How Data is Serialized for Transport

The data is serialized for transport using JSON encoding in following the Avro 1.4.1 specification. This JSON encoding is also the same as the JSON expression used to describe default values.

One notable difference from the Avro spec is that optional fields with no value are represented by its omission in the serialized data. To phrase it differently, optional fields are never explicitly set to null in the serialized body. As such, null is never a valid value to appear in the serialized data. The only exception to this rule is if the schema for the data is a union that has a null member.

The following table summarizes the JSON encoding.

Schema Type
JSON Type
JSON Encoding Examples
int
number
123
long
number
123456789000
float
number
3.5
double
number
3.5555555
boolean
true or false
true
string
string
“hello”
bytes
string (bytes encoded as least significant 8-bits of 16-bit character)
“\u00ba\u00db\u00ad”
enum
string
“APPLE”
fixed
string (bytes encoded as least significant 8-bits of 16-bit character) “\u0001\u0002\u0003\u0004” (fixed of size 4)
array
array
[ 1, 2, 3 ]
map
object
{ “a” : 95, “b” : 90, “c” : 85 }
record (error)
object (each field is encoded using a name/value pair in the object)
{ “intField” : 1, “stringField” : “abc”, “fruitsField” : “APPLE” }
union
null if value is null.

object if member value is not null with only one name/value pair in the object. The name will be the member discriminator (NOTE Member discriminator will be the member’s alias if one is specified, else it is the member’s fully qualified type name.) and value is the JSON encoded value.
null

{ “int” : 1 }

{ “float” : 3.5 }

{ “string” : “abc” }

{ “array” : { “s1”, “s2”, “s3” } }

{ “map” : { “key1” : 10, “key2” : 20, “key3” : 30 } }

{ “com.linkedin.generator.examples.Fruits” : “APPLE” }

If a union schema has a typeref member, then the key for that member is the dereferenced type. E.g. for union

  {
    "name" : "unionField",
    "type" : [
      "int",
      { "type" : "typeref", "name" : "a.b.c.d.Foo", "ref"  : "string" }
    ]
  }

the JSON encoding for the typeref member should look like

{ “string” : “Correct key” }

NOT

{ “a.b.c.d.Foo” : “Wrong key” }

Similarly, for a union with aliased members the key for the members will be its corresponding alias. For example,

{
  "name" : "unionField",
  "type" : [
    { "type" : "int", "alias" : "count" },
    { "type" : { "type" : "typeref", "name" : "a.b.c.d.Foo", "ref"  : "string" }, "alias" : "foo" }
  ]
}

the JSON encoding for the typeref member should look like

{ “foo” : “Correct key” }

How to serialize data to JSON

DataMapUtils provides convenience methods to serialize and deserialize between data and JSON using JacksonDataCodec.

To serialize from a DataMap to JSON:

DataMap dataMap = new DataMap();
dataMap.put("message", "Hi!");
byte[] jsonBytes = DataMapUtils.mapToBytes(dataMap);
String json = new String(jsonBytes, "UTF-8");

To serialize from a RecordTemplate instance to JSON:

Greeting greeting = new Greeting().setMessage("Hi!"); // Where Greeting is class extending RecordTemplate
byte[] jsonBytes = DataMapUtils.dataTemplateToBytes(greeting, true);
String json = new String(jsonBytes, "UTF-8");

How to Deserialize JSON to Data

To deserialize from JSON to a DataMap:

InputStream in = IOUtils.toInputStream("{'message':'Hi!'}");
DataMap dataMap = DataMapUtils.readMap(in);

To deserialize from JSON to a RecordTemplate:

InputStream in = IOUtils.toInputStream("{'message':'Hi!'}");
Greeting deserialized = DataMapUtils.read(in, Greeting.class); // Where Greeting is class extending RecordTemplate

How to Serialize Data to PSON

PSON is a binary format that can represent any JSON data but is more compact, requires less computation to serialize and deserialize, and can transmit byte strings directly.

PSON serialization/deserialization works similar to JSON (as described above) but uses these two methods:

DataMapUtils.readMapPson()
DataMapUtils.mapToPsonBytes()