Array serialization is weird and not according to Avro specification?

Jan 6, 2015 at 11:12 AM
I'm implementing a protocol specified as an Avro Schema and using Avro binary serialization. And I have ran into problems with the array serialization.
[DataContract()]
    class Foo
    {
        [DataMember]
        public IList<int> array { get; set; }
    }
With the class above, I assign the array to an object of type List<int> it serializes as 00,04,06,36,00. If I assign it to an array of type int[] it is serialized as 02,04,06,36,00.

However if I change the property "array" to the type List<int> instead, it serializes as expected according to the Avro spec to the bytes: 04 06 36 00.

This behaviour is quite problematic because the protocol is cross system, multi vendor so we cannot deal with different binary serializations.

Also the code generator in the SDK uses IList<> for arrays, so I cannot work around this by simply using List<> instead.


Is this a known problem or should I file a bug on this?


Thanks,
Sjur Brendeland
Coordinator
Jan 7, 2015 at 9:07 PM
Hi Sjur,

Could you provide more information what these numbers are? Are these array elements?

Thanks,
Maxim
Jan 8, 2015 at 1:47 PM
Hi Maxim,

Sorry for being unclear here.
I am serializing the integer array {3,27} (decimal) as in the example in the Avro specification (http://avro.apache.org/docs/current/spec.html#Data+Serialization)
The array 3 and 27 (decimal) as List<int:> serializes into 00,04,06,36,00 (hexadecimal numbers).
The array 3 and 27 (decimal) as int[] serializes into 02,04,06,36,00 (hexadecimal numbers).
But if the property "array" is changed to type "List<int>" it serializes as expected to 04 06 36 00 (hexadecimal numbers).

Cheers,
Sjur
Coordinator
Jan 9, 2015 at 8:15 PM
Our devs provided some insight: If you look into the schema you will see that IList<int> is not an array, because this field can be null (arrays cannot be null in avro). So it is UNION {NULL, ARRAY}. That’s where the first 02 (Encode(1)) is coming from.

If your scenario guarantees that there won’t be any nulls, you can use allowNulls = false in the settings of the serializer.

Best regards,
Maxim