to Avro Conversions


Sometimes it is necessary to convert between Avro and formats. That is, either converting schemas ( DataSchemas to Avro Schemas and vice versa) or converting data ( DataMaps to Avro GenericRecords and vice versa). provides ways to do this using the data-avro module.

Converting Schemas

The key class for converting schemas is the SchemaTranslator class.

Converting Avro to

To convert from Avro to, you will use the avroToDataSchema methods in SchemaTranslator. The default method takes in only the Avro schema you wish to convert as input:;

This schema can either be a stringified version of the Schema or an org.apache.avro.Schema.

There is also a similar method that also accepts an AvroToDataSchemaTranslationMode. Generally, this method doesn’t need to be used. However, if you have embedded your schema within your Avro schema, and you can use this with the AvroToDataSchemaTranslationMode to speed up the translation process. This is normally done when translating from format to Avro format. See the section for converting from to Avro to learn more about this.

Converting to Avro

To convert from to Avro, you will use the dataToAvroSchema methods in SchemaTranslator. Like the avroToDataSchema method, it can take in either a stringified restli schema, or a DataSchema, and, optionally, a DataToAvroSchemaTranslationOptions;, translationOptions);

DataToAvroSchemaTranslationOptions has four parts:

  • The translation mode OptionalDefaultMode
  • The JSON style JsonBuilder.Pretty
  • The schema embedding mode EmbedSchemaMode
  • The namespace override flag overrideNamespace

OptionalDefaultMode determines how defaults are translated into format. Since Avro requires that a union’s default value always be of the same type as the first member type of the union, if a type is not consistently initialized with a single default type, translations may encounter problems. By default this value is set to TRANSLATE_DEFAULT, but if your translations are encountering issues around default values, you may wish to set this to TRANSLATE_TO_NULL, which will cause all optional fields with a default value to have their default value set to null in the Avro translation.

JsonBuilder.Pretty simply sets the format of the output JSON. By default, this is set to COMPACT.

EmbedSchemaMode determines whether or not to embed the original schema into the resulting Avro schema. This can speed translation back (or make a translation back more accurate) to format with the correct settings passed to the avroToDataSchema method. By default, this is set to NONE.

overrideNamespace is a boolean flag indicating whether the namespaces of the translated Avro schemas should be overridden. If this flag is set to true, then the namespace of each translated Avro schema will be prepended with a special prefix, "avro." (e.g. com.x.y becomes This is helpful in cases where pegasus schemas and their Avro counterparts are included in the same project, potentially causing namespace/package conflicts.

Converting Data

The key class for converting data is the DataTranslator class.

Converting Avro to

To convert from Avro to, you will use the genericRecordToDataMap method in DataTranslator. You’ll need the Avro GenericRecord you are converting, the Avro Schema the GenericRecord conforms to, and the RecordDataSchema of the type you are converting to:, recordDataSchema, avroSchema);

There are no versions of this method that accept any special options.

Converting to Avro

To convert from to Avro, you will use the dataMapToGenericRecord methods in DataTranslator. You will need the DataMap you are converting, the RecordDataTemplate your DataMap conforms to, and, optionally, the Avro Schema you are converting your data to. If you do not pass in an Avro Schema, then the schema translator will be used to convert your passed in RecordDataSchema to an Avro Schema, using default settings., dataSchema);, dataSchema, avroSchema);

Automatically generating avro schemas as part of a build will generate avro schemas for all your pegasus schemas (.pdl files) automatically if the build is configured to enable this.

See Gradle generateAvroSchema Task for details on how to enable.


How do I get the RecordDataSchema of a particular Record type?

The RecordDataSchema field of generated Record classes are private, so you cannot get them directly. However, there is a helper method in called getSchema that can help you get the Schema. Simply pass in the class of the Record and it will return a basic DataSchema. If you know this Schema is a RecordDataSchema, you can safely cast the result to RecordDataSchema.