This project is read-only.

What is Microsoft Avro Library?

Apache Avro is a language neutral data serialization system widely used in Hadoop environments. The Microsoft Avro Library implements the Avro data serialization system for the Microsoft.NET environment.

Installation

The library is installed as NuGet package. To install it invoke the following command in NuGet Package Management Console:
PM> Install-Package Microsoft.Hadoop.Avro

Overview and Getting Started

Read Microsoft Avro Library Overview with detailed installation instructions and code examples.
Download complete code samples: You may also want to read the Library Release Notes located below at this page. If you plan to use serialization using Reflection, be sure you are familiar with the Release Notes.

API Documentation and Source Code

Microsoft Avro Library source code is available in the Source Code section of this site.
Microsoft Avro Library API reference documentation is available here.

Microsoft Avro Library Release Notes

  • Current version is 1.4.0.0
  • A utility generating C# types out of JSON schema was added. See Microsoft Avro Library Overview for information on building and using the utility. Using Avro with HDInsight service sample includes an example of the code generated by the utility.
  • RPC (Remote Procedure Call) part of Avro Specification is not supported in the current version
  • Previous version of the Library generated an exception when attempting to serialize a nested property using Reflection. This was fixed, but if you plan to work with the nested properties, please, pay attention to the section below.

Serializing nested properties using Reflection

Microsoft Avro Library supports serializing and deserializing of nested properties. However, the following should be taken into consideration.
Serializing and deserializing of the properties derived from an abstract class is completely supported. The Class B below will be serialized and deserialized without any issues.
[DataContract]
abstract class A
{
    [DataMember]
    public abstract int PropertyToSerialize { get; set; }
}

[DataContract]
class B : A
{
    [DataMember]
    public override int PropertyToSerialize { get; set; } //this will be serialized
}
In cases where the properties are derived from a non-abstract class, using override or new modifiers only the value of the last member in the inheritance chain will actually be serialized. In the example below only Property value from Class C will be retained after serialization/deserialization. Property values of base classes will be lost. This behaviour is the same for both overridden and hidden properties.
namespace Microsoft.Hadoop.Avro.Sample
{
    using System;
    using System.IO;
    using System.Runtime.Serialization;
    using Microsoft.Hadoop.Avro;

    //Chain of nested classes
    //with overridden and hidden properties to serialize
    [DataContract]
    class A
    {
        [DataMember]
        public virtual int PropertyToSerialize { get; set; } //this will NOT be serialized
    }

    [DataContract]
    class B : A
    {
        [DataMember]
        public override int PropertyToSerialize { get; set; } //this will NOT be serialized

        public virtual void showBase()
        {
            Console.WriteLine("Property value in Class A is {0}", base.PropertyToSerialize);
        }
        
        public void setBase(int a)
        {
            base.PropertyToSerialize = a;
        }
    }

    [DataContract]
    class C : B
    {
        [DataMember]
        public new int PropertyToSerialize { get; set; } //ONLY this will be serialized

        public void showA()
        {
            base.showBase();
        }

        public void showB()
        {
            Console.WriteLine("Property value in Class B is {0}", base.PropertyToSerialize);
        }
        public void showThis()
        {
            Console.WriteLine("Property value in Class C is {0}", this.PropertyToSerialize);
        }

        public void setA(int a)
        {
            base.setBase(a);
        }

        public void setB(int a)
        {
            base.PropertyToSerialize = a;
        }
    }

    class AvroNestedProperties
    {
        public string sectionDivider = "---------------------------------------- ";

        //Serialize and deserialize test object using Reflection
        public void SerializeNestedProperties()
        {
            using (var buffer = new MemoryStream())
            {
                //Create Avro Serializer
                var avroSerializer = AvroSerializer.Create<C>();

                //Initialize an instance of Class C and
                //set values for PropertyToSerialize for Class C
                //and base Classes
                var sourceObj = new C { PropertyToSerialize = 98 };
                sourceObj.setA(30);
                sourceObj.setB(6);

                //Display property values in entire chain before serialization
                Console.WriteLine("PropertyToSerialize values before serialization:\n");
                sourceObj.showA();
                sourceObj.showB();
                sourceObj.showThis();

                //Serialize and deserialize the object
                Console.WriteLine(sectionDivider);
                Console.WriteLine("Serializing and deserializing test object using Reflection...");
                avroSerializer.Serialize(buffer, sourceObj);
                buffer.Seek(0, SeekOrigin.Begin);
                var targetObj = avroSerializer.Deserialize(buffer);

                //Display property values in entire chain after serialization/deserialization
                Console.WriteLine(sectionDivider);
                Console.WriteLine("PropertyToSerialize values after serialization:\n");
                targetObj.showA();
                targetObj.showB();
                targetObj.showThis();
            }
        }

        public static void Main()
        {
            AvroNestedProperties TestNetstedProperties = new AvroNestedProperties();

            Console.WriteLine("NESTED PROPERTIES IN SERIALIZED/DESERIALIZED OBJECTS.");
            Console.WriteLine(TestNetstedProperties.sectionDivider);

            TestNetstedProperties.SerializeNestedProperties();

            Console.WriteLine(TestNetstedProperties.sectionDivider);
            Console.WriteLine("Press any key to exit.");
            Console.Read();
        }
    }
}

//This code will produce the following output:

//NESTED PROPERTIES IN SERIALIZED/DESERIALIZED OBJECTS.
//----------------------------------------
//PropertyToSerialize values before serialization:
//
//Property value in Class A is 30
//Property value in Class B is 6
//Property value in Class C is 98
//----------------------------------------
//Serializing and deserializing test object using Reflection...
//----------------------------------------
//PropertyToSerialize values after serialization:
//
//Property value in Class A is 0
//Property value in Class B is 0
//Property value in Class C is 98
//----------------------------------------
//Press any key to exit.

Last edited Nov 23, 2014 at 2:16 PM by alexeyo, version 11

Comments

alexeyo Apr 14, 2015 at 9:33 PM 
Sorry, but today ConcurrentDictionary is not supported.

AllenZhang_Spark Apr 14, 2015 at 4:36 AM 
How do I serialize a ConcurrentDictionary? AVRO treats it as a normal Dictionary and complains about not able to find the 'Add' method. Is this not supported, yet?
Is there a way to write my own serializer for ConcurrentDictionary?

alexeyo Jun 3, 2014 at 12:09 PM 
Regarding the comment on optional/required field (posted by Sid8264 below) - see discussion at http://hadoopsdk.codeplex.com/workitem/53

alexeyo Apr 10, 2014 at 5:35 PM 
Currently we do not have a tool for JSON schema compilation. We consider this for further updates.

jwang98052 Mar 14, 2014 at 1:26 AM 
How can I auto generate C# Data Contract classes from Avro schema? In Java there is a tool to auto generate classes from Avro schema.

tle5 Dec 28, 2013 at 8:07 AM 
Hi,

Any solution for nullable of primitive type property?
class A{
public int? Id {get;set;} //it doesn't work
}
thanks,

Sid8264 Dec 27, 2013 at 10:30 PM 
How does one prevent marking fields as optional? i.e. equivalent of {"name":"ObjectId","type":"string"} ?
At present "[DataMember(IsRequired = true)]" doesn't seem to take effect, resulting in a schema equivalent of {"name":"ObjectId","type":["null","string"]} and a binary payload that's incompatible with the standard Avro libraries released by Apache foundation.

chauchauvn Nov 8, 2013 at 4:42 AM 
Hi maxluk

Can I use AvroSerializer to Serialize/DeSerialize an Object with a my manual Schema?
How can I do it?
Tks.