The data is serialized for transport using JSON encoding in following the Avro 1.4.1 specification. This JSON encoding is also the same as the JSON expression used to describe default values.
One notable difference from the Avro spec is that optional fields with
no value are represented by its omission in the serialized data. To
phrase it differently, optional fields are never explicitly set to
null
in the serialized body. As such, null
is never a valid value
to appear in the serialized data. The only exception to this rule is if
the schema for the data is a union that has a null
member.
The following table summarizes the JSON encoding.
Schema Type |
JSON Type |
JSON Encoding Examples |
---|---|---|
int |
number |
123 |
long |
number |
123456789000 |
float |
number |
3.5 |
double |
number |
3.5555555 |
boolean |
true or false |
true |
string |
string |
“hello” |
bytes |
string (bytes encoded as least significant 8-bits of 16-bit character) |
“\u00ba\u00db\u00ad” |
enum |
string |
“APPLE” |
fixed |
string (bytes encoded as least significant 8-bits of 16-bit character) | “\u0001\u0002\u0003\u0004” (fixed of size 4) |
array |
array |
[ 1, 2, 3 ] |
map |
object |
{ “a” : 95, “b” : 90, “c” : 85 } |
record (error) |
object (each field is encoded using a name/value pair in the object) |
{ “intField” : 1, “stringField” : “abc”, “fruitsField” : “APPLE” } |
union |
null if value is null. object if member value is not null with only one name/value pair in the object. The name will be the member discriminator (NOTE Member discriminator will be the member’s alias if one is specified, else it is the member’s fully qualified type name.) and value is the JSON encoded value. |
null { “int” : 1 } { “float” : 3.5 } { “string” : “abc” } { “array” : { “s1”, “s2”, “s3” } } { “map” : { “key1” : 10, “key2” : 20, “key3” : 30 } } { “com.linkedin.generator.examples.Fruits” : “APPLE” } |
If a union schema has a typeref member, then the key for that member is the dereferenced type. E.g. for union
{
"name" : "unionField",
"type" : [
"int",
{ "type" : "typeref", "name" : "a.b.c.d.Foo", "ref" : "string" }
]
}
the JSON encoding for the typeref member should look like
{ “string” : “Correct key” }
NOT
{ “a.b.c.d.Foo” : “Wrong key” }
Similarly, for a union with aliased members the key for the members will be its corresponding alias. For example,
{
"name" : "unionField",
"type" : [
{ "type" : "int", "alias" : "count" },
{ "type" : { "type" : "typeref", "name" : "a.b.c.d.Foo", "ref" : "string" }, "alias" : "foo" }
]
}
the JSON encoding for the typeref member should look like
{ “foo” : “Correct key” }
DataMapUtils
provides convenience methods to serialize and deserialize
between data and JSON using JacksonDataCodec
.
To serialize from a DataMap to JSON:
DataMap dataMap = new DataMap();
dataMap.put("message", "Hi!");
byte[] jsonBytes = DataMapUtils.mapToBytes(dataMap);
String json = new String(jsonBytes, "UTF-8");
To serialize from a RecordTemplate instance to JSON:
Greeting greeting = new Greeting().setMessage("Hi!"); // Where Greeting is class extending RecordTemplate
byte[] jsonBytes = DataMapUtils.dataTemplateToBytes(greeting, true);
String json = new String(jsonBytes, "UTF-8");
To deserialize from JSON to a DataMap:
InputStream in = IOUtils.toInputStream("{'message':'Hi!'}");
DataMap dataMap = DataMapUtils.readMap(in);
To deserialize from JSON to a RecordTemplate:
InputStream in = IOUtils.toInputStream("{'message':'Hi!'}");
Greeting deserialized = DataMapUtils.read(in, Greeting.class); // Where Greeting is class extending RecordTemplate
PSON is a binary format that can represent any JSON data but is more compact, requires less computation to serialize and deserialize, and can transmit byte strings directly.
PSON serialization/deserialization works similar to JSON (as described above) but uses these two methods:
DataMapUtils.readMapPson()
DataMapUtils.mapToPsonBytes()