R language S4Object Serialization to .NET Object - CodeProject

:

Download Links

The entire source code of the Shoal Shell language can be download from SourceForge svn server:

Quote:

svn checkout svn://svn.code.sf.net/p/shoal/Source/ shoal-Source

 

Testing example source code in this article:

Download RDotNET.Hybrids.API.Extensions.zip

 

Related links about Shoal Shell and hybrids programming

Quote:

Powerful ShellScript for bioinformatics researchers

http://www.codeproject.com/Articles/820854/Powerful-ShellScript-for-bioinformatics-researcher

Guide line of integrated ShellScript with R Hybrid programming

http://www.codeproject.com/Articles/832975/Guide-line-of-integrated-ShellScript-with-R-Hybrid

 

Introduction

Doing the hybrid scripting between VB and R language is painful when you read the calculation data of the R expression, so I want to developed a simple wrapper operation to do this data conversion job automatically.

In my recently laboratory scientific research job, I want to analysis the gene expression regulation signal from the virtual cell real-time gene chip data. And the R version of the wavelets library can finish this job perfectly, so that the code in this article makes this hybrid programming happy and simple.

Wavelets analysis using the VB/R hybrids programming code example in this article show the gene expression regulation signal changes in the bacteria genome.

 

Picture1. Steps overview of the Vb/C#/R hybrids programming

Using the code

Quote:

Steps overviews of the hybrids programming:

1.    Create mapping between the .NET class object property and the S4Object attribute

2.    R expression evaluation

3.    Serialize the R symbolic expression into a.NET object instance.

So, that’s it, just 3 simple steps for you to hybrid programming between the VB/C# and R language. Let’s learns how to step by step

 

1.    Create mapping between the .NET class object property and the S4Object attribute

This step is the step of create the schema mapping between the R object and your .NET object, as the same as the xml serialization, before you are going to create a xml document using the xml serialization, you should define a class object to description the document xml format; after the type definition then you are able to create a xml document.

So that in this steps is the same as how you did in the xml serialization, but the difference between the xml serialization and this R object serialization just we are using different custom attribute.

Before create the mapping, let’s learn the types in R language:

In my opinion, the R object can be divided into 3 types:

1.    S4Object, the s4object is just like the class object in.NET language. The property in a.NET object is equals to the s4object attribute (or slot) in the R language. The mainly function in this article’s code is to implement the mapping between our .NET class object and the R s4object.

2.    Function, the function object in R language is just like the lambda expression or delegate in.NET language, the declaration of the function in R is just like the lambda expression declaration in.NET.

3.    Generic vector, the generic vector is the most used object in R language because almost all of the object in R language is a vector. Like the array or list in.NET, the vector can be a property (or attribute) of an s4object in R language and it can also consists with a collection of s4object.

So, as you can see in .NET language, our class object is equals to the s4object in R language, so that the mapping we created in these steps is on the class property. The mapping between the s4object attribute and the property in .NET class is using the DataFrameColumnAttribute, it is in the namespace of Microsoft.VisualBasic.ComponentModel.DataSourceModel, as you can see from the class definition of the customers attribute DataFrameColumnAttribute, this attribute only can be applied on the property or field:

Namespace ComponentModel.DataSourceModel

    ''' <summary>
    ''' Represents a column of certain data frames. The mapping between to schema is also can be represent by this attribute.
    ''' (也可以使用这个对象来完成在两个数据源之间的属性的映射,由于对于一些列名称的属性值缺失的映射而言,
    ''' 其是使用属性名来作为列映射名称的,故而在修改这些没有预设的列名称的映射属性的属性名的时候,请注意
    ''' 要小心维护这种映射关系)
    ''' </summary>
    <AttributeUsage(AttributeTargets.[Property] Or AttributeTargets.Field, Inherited:=True, AllowMultiple:=False)> _
    Public Class DataFrameColumnAttribute : Inherits Attribute     

 

Here is an example code to create the mapping using this attribute:

Imports Microsoft.VisualBasic.ComponentModel.DataSourceModel

    Public Class Filter
        <DataFrameColumn> Public Property L As Integer
        <DataFrameColumn("level")> Public Property level As Integer
        <DataFrameColumn("h")> Public Property h As Double()
        <DataFrameColumn("g")> Public Property g As Double()
        <DataFrameColumn("wt.class")> Public Property wtclass As String
        <DataFrameColumn("wt.name")> Public Property wtname As String
        <DataFrameColumn("transform")> Public Property transform As String
        <DataFrameColumn("class")> Public Property [class] As String
    End Class

as you can see, the first property

<DataFrameColumn> Public Property L As Integer

Their mappings have no column name, so that when we create a mapping, the serializes will using its property name as the mapping name automatically.

the mapping needs a name property is due to the reason of some attribute in the R s4object is illegal in.NET language, such as wt.class in.NET property name is not allowed, so that you can using the DataFrameColumn mapping attribute to accomplished this job.

 

2.    R expression evaluation

We are going to get result from R using RDotNET; this library is the most perfect solution that we can implement the hybrid programming between our VB/C# .NET language and the R language.

You can download the RDotNET library from codeplex home page

https://rdotnet.codeplex.com/

 

Just two simple Steps to hybrid programming between the .NET language and the R language:

First, start the R engine services, for example:

If Not String.IsNullOrEmpty(R_HOME) Then
    Wavelets.R = RDotNET.REngine.StartEngineServices(R_HOME)
Else
    Wavelets.R = RDotNET.REngine.StartEngineServices
End If

Call Wavelets.R.Library(PackageName:="wavelets")<span style="color: rgb(17, 17, 17); font-family: 'Segoe UI', Arial, sans-serif; font-size: 14px;">                </span>

Start a R engine services needs a R_HOME value which is the directory where your R program installed, such as the default location of the R installer

C:\Program Files\R\R-3.1.3\bin

If your R program is properly installed on your computer, then the RDotNET can search for the R_HOME automatically base on the registry value of the R program, then you can just using the non-parameter version of the RDotNET.REngine.StartEngineServices to create instance. If not then you can using the RDotNET.REngine.StartEngineServices(R_HOME) to manual setup the R install location.

After you have created a R engine services instance using RDotNET, then you can code in your .NET program. the most things in your hybrid programming is that many of the analysis program in R is not originally included in the base package, so that before you are going to run the program, you should install the required R package in R terminal. When you have finish and successfully install the R package, then you can using the Library function in the REngine to load the required library package.

Call Wavelets.R.Library(PackageName:="wavelets")

or you also can put this step in the scripting steps:

Dim STDOUT = Wavelets.R <= "library(""wavelets"")"

Then you can just simply invoke the R calculation using the R.Evaluate function, this function returns the RDotNET symbolic expression object which expose the R memory into your .NET program. Unlike the <= operator in RDotNET, <= operator returns the STDOUT string collection which was displays on the terminal console.

 

3.      Serialize the R symbolic expression into a .NET object instance.

In this step we can just serialize an RDotNET symbolic expression into a .NET object with just one statement, your hybrids programming with R language things just keeps simple and happy :-).

We assume that you have properly create the mapping class object in your program, and then you have get a result value from the R invoked evaluation, so than you can just done the serialization job simply like the operation show below:

Dim Result = RDotNET.Extensions.ShellScriptAPI.Serialization.LoadFromStream(Of Wavelets.Waveletmodwt)(TestResultRS4Object)

 

How does this code working?

This Serialization operation can be found at namespace location: RDotNET.Extensions.ShellScriptAPI.Serialization. And there are two interface to invoke this serialization:

      

Imports RDotNET.SymbolicExpressionExtension

''' <summary>
''' Convert the R object into a .NET object from the specific type schema information.
''' (将R之中的对象内存数据转换为.NET之中指定的对象实体)
''' </summary>
''' <remarks></remarks>
Public Module Serialization

    ''' <summary>
    ''' Deserialize the R object into a specific .NET object. <see cref="RDotNET.SymbolicExpression"></see>  =====> <see cref="T"></see>
    ''' </summary>
    ''' <typeparam name="T"></typeparam>
    ''' <param name="RData"></param>
    ''' <returns></returns>
    ''' <remarks>
    ''' 反序列化的规则:
    ''' 1. S4对象里面的Slot为对象类型之中的属性
    ''' 2. 任何对象属性都会被表示为数组
    ''' </remarks>
    Public Function LoadFromStream(Of T As Class)(RData As RDotNET.SymbolicExpression) As T
        Dim value As Object = InternalLoadFromStream(RData, GetType(T))
        Return DirectCast(value, T)
    End Function

    ''' <summary>
    ''' Needs your manual type casting in your program.
    ''' </summary>
    ''' <param name="RData"></param>
    ''' <param name="Type"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Function LoadRStream(RData As RDotNET.SymbolicExpression, Type As Type) As Object
        Dim value As Object = InternalLoadFromStream(RData, Type)
        Return value
    End Function  

  

Due to the reason of the s4object in R maybe have some vector in its attribute and the element in the vector is possibly an s4object type, so that the serialization of the s4object is a recursive operation. So at first we start this recursive operation from this function:

  

''' <summary>
''' Load the R symbolic expression data recursivly start from here.
''' </summary>
''' <param name="RData"></param>
''' <param name="TypeInfo"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function InternalLoadFromStream(RData As RDotNET.SymbolicExpression, TypeInfo As System.Type) As Object
    Select Case RData.Type

        Case Internals.SymbolicExpressionType.S4

            'Load the R symbolic expression data recursivly start from here.
            Return InternalLoadS4Object(RData, TypeInfo)

        Case Internals.SymbolicExpressionType.LogicalVector
            Return RData.AsLogical.ToArray
        Case Internals.SymbolicExpressionType.CharacterVector
            Return RData.AsCharacter.ToArray
        Case Internals.SymbolicExpressionType.IntegerVector
            Return RData.AsInteger.ToArray
        Case Internals.SymbolicExpressionType.NumericVector
            Return RData.AsNumeric.ToArray
        Case Internals.SymbolicExpressionType.List
            Return InternalCreateMatrix(RData, TypeInfo)

        Case Else
            Throw New NotImplementedException

    End Select

End Function

As you can see in this function, if the r object is an s4object then the program will continue the operation recursive, or else if the object is an elementary type, then the function will exists from the recursive operation and returns the value. In this serializes, we just simply read the simple data type in .NET language: Boolean, String, Integer, Double and Object(), other data type such as function in R(lambda expression in.NET language) is skipped in this function, because we don't know how to save this data into the filesystem.

Then we are going to the recursive operation steps if the object we are going to mapping in our program is the s4object in R language

Case Internals.SymbolicExpressionType.S4

    'Load the R symbolic expression data recursivly start from here.
    Return InternalLoadS4Object(RData, TypeInfo)

                          

''' <summary>
''' The recursive operation of the S4Object in R starts from here. this recursive operation will stop when the property value is not a S4Object.
''' (这个可能是一个递归的过程,一直解析到各个属性的R类型不再是S4对象类型为止)
''' </summary>
''' <param name="RData"></param>
''' <returns></returns>
''' <remarks></remarks>
Private Function InternalLoadS4Object(RData As RDotNET.SymbolicExpression, TypeInfo As System.Type) As Object
    Dim Mappings = Microsoft.VisualBasic.ComponentModel.DataSourceModel.DataFrameColumnAttribute.LoadMapping(TypeInfo)
    Dim obj As Object = Activator.CreateInstance(TypeInfo)

    Call Console.WriteLine("[DEBUG] {0}  ---> R.S4Object (""{1}"")", TypeInfo.FullName, String.Join("; ", RData.GetAttributeNames))

    For Each Slot In Mappings
        Dim RSlot As RDotNET.SymbolicExpression = RData.GetAttribute(Slot.Key.Name)
        Dim value As Object = InternalLoadFromStream(RSlot, Slot.Value.PropertyType)

        Call InternalValueMapping(value, Slot.Value, obj:=obj)
    Next

    Return obj
End Function

We are going to loads the mapping at first in this step using:

Dim Mappings = Microsoft.VisualBasic.ComponentModel.DataSourceModel.DataFrameColumnAttribute.LoadMapping(TypeInfo)

Then we create an object instance of target mapping type to contain the data.

Dim obj As Object = Activator.CreateInstance(TypeInfo)

due to the reason of attribute in S4Object is equals to the .NET class property, so that when are have load the mapping from the meta data in the schema definition of the target type in our .NET program, then we can load the data from R expression specific for each property in our class. The steps in the For loop contains these steps:

 

1) Gets the specific attribute in S4Object as the mapping serialization data source:

Dim RSlot As RDotNET.SymbolicExpression = RData.GetAttribute(Slot.Key.Name)

2) Then we are able to continue deserialization of the R expression recursivly

Dim value As Object = InternalLoadFromStream(RSlot, Slot.Value.PropertyType)

3) At last we get the value in.NET format, so that we can assign the value to the property using the reflection operation.

Call InternalValueMapping(value, Slot.Value, obj:=obj)

 

The matrix value cannot be directly assign using the reflection

As you can see in the previous steps, the value we've gets from the serialization mapping is not directly assign to the specific property, but using a function to implement this job, this is because of the reason of the matrix object in R is mapping as the array of (object array)...... so that we gets the matrix from R in fact is an object array(due to the reason of object array is equals to the object type, or everything in.NET is equals to the object type due to the reason all of the data type in.NET is inherits from the object type.) so that the matrix in R in fact the.NET program thinks it is an object array, not a specific type array's array. So that when we directly assign the matrix value, the program will crash!

  

Picture2. how does the R Matrix will convert to a object array

  Finally we gets an Object() which the element type in this array is Double(), not the type we want: Double()() matrix, this will caused the exception. So that we are using the function

''' <summary>
'''
''' </summary>
''' <param name="value"></param>
''' <param name="pInfo"></param>
''' <param name="obj">对象实例</param>
''' <returns></returns>
''' <remarks></remarks>
Private Function InternalValueMapping(value As Object, pInfo As System.Reflection.PropertyInfo, ByRef obj As Object) As Boolean
    Dim pTypeInfo As System.Type = pInfo.PropertyType

    If pTypeInfo.HasElementType Then
       Call InternalMappingCollectionType(value, pInfo, obj, pTypeInfo)
    Else
       Call InternalRVectorToNETProperty(pTypeInfo:=value.GetType, value:=value, obj:=obj, pInfo:=pInfo)
    End If

    Return True
End Function

To help us to correctly convert the vector matrix type into a properly .NET array type.

Due to the reason of all most all of the R data type is a vector, so that when the property in our .NET class is a single element such as string/integer/double not the vector string()/integer()/double(), so that when the reflected type of the property in .NET class is a single element, then we just needs convert the r data to an array and gets the first element value, things just works fine. When the data type in our .NET class property is an array, then we just directly assign the r converted value to it, thing are also works fine!

Convert the object array into a specific type matrix using this function:     

''' <summary>
''' Object() to T()()
''' </summary>
''' <param name="value"></param>
''' <param name="pInfo"></param>
''' <param name="obj"></param>
''' <param name="pTypeInfo"></param>
''' <remarks></remarks>
Private Sub InternalMappingCollectionType(value As Object, pInfo As System.Reflection.PropertyInfo, ByRef obj As Object, pTypeInfo As System.Type)
    Dim EleTypeInfo As Type = pTypeInfo.GetElementType
    Dim SourceList = (From val As Object In DirectCast(value, System.Collections.IEnumerable) Select val).ToArray
    Dim List = Array.CreateInstance(EleTypeInfo, SourceList.Count)

    For i As Integer = 0 To SourceList.Count - 1
        Call List.SetValue(SourceList(i), i)
    Next

    Call pInfo.SetValue(obj, List)
End Sub

we can using the Array.CreateInstance this reflection operation function to create a type specific array, before we create the array, we should knowing its element type, the element type can be knowing from the reflection of the property type:

Dim EleTypeInfo As Type = pTypeInfo.GetElementType

Due to the reason of we already have known that the R converted data is a matrix, so that we directly convert it into an array data:

Dim SourceList = (From val As Object In DirectCast(value, System.Collections.IEnumerable) Select val).ToArray

At last, we have known two key element of how to create an array:   its element type and the element counts in the array (or we can say the array size)

Dim List = Array.CreateInstance(EleTypeInfo, SourceList.Count)

After we using the List.SetValue to assign the element value to each position in the array, then we gets a array(of array) type matrix in the .NET program. At last finally we can assign this converted matrix value to the specific property:

Call pInfo.SetValue(obj, List)

      

 

A simple code testing example

In the test project you can learn how to do this happy and easily hybrid programming. There are two modules in the test project

 

Quote:

Module Wavelets for define the required r function and r object mapping type to read the wavelets calculation result from the r invoke

Module Program for testing example code

 

Important note:

Before you running this code, the R program should properly install on your computer and the required wavelets R library is also should install on your R system.

 

1 the simplest VB/C# hybrid programming example

' VB/C# with R language hybrid programming example

Dim ChipData = (From row As Microsoft.VisualBasic.DataVisualization.DocumentFormat.Csv.File.RowObject
                In Microsoft.VisualBasic.DataVisualization.DocumentFormat.Csv.File.FastLoad("../DM_1184.GeneChipDataSamples.csv")
                Select ID = row.First, ExpressionData0 = (From s As String In row.Skip(1) Select Val(s)).ToArray).ToArray

Call Wavelets.Initialize()

Dim TestResultRS4Object = Wavelets.DWT_RInvoke(ChipData.First.ExpressionData0, filter:="haar")
Dim Result = RDotNET.Extensions.ShellScriptAPI.Serialization.LoadFromStream(Of Wavelets.Waveletmodwt)(TestResultRS4Object)

Call Result.GetXml.SaveTo("./Test.Result.xml")

             

The program code following the typical steps of the R hybrids programming:

a. Initialize the R engine services and load the required library in function:

Call Wavelets.Initialize()

b. And then invoke the R function gets a RDotNET symbolic expression

Dim TestResultRS4Object = Wavelets.DWT_RInvoke(ChipData.First.ExpressionData0, filter:="haar")

c. At last we gets the result in the .NET class format through the serialization:

Dim Result = RDotNET.Extensions.ShellScriptAPI.Serialization.LoadFromStream(Of Wavelets.Waveletmodwt)(TestResultRS4Object)

Invoke the wavelets signal analysis is just with simple and happy 3 steps of coding, right? :-) :-) :-) :-) :-) :-) :-) :-) :-) :-) :-) :-) :-) :-)

             

2. Hybrids scripting with the ShoalShell language.

The Shoal Shell language (http://sourceforge.net/projects/shoal) is a new type of embed scripting language in .NET which is original develop for my virtual cell system. And it has the feature of a lot of hybrids scripting ability with R/Perl/SQL/LINQ, currently I just released the R hybrids scripting API for the shoal shell.

The example shows how to hybrids scripting with shoal/R and your .NET program:

             

'Shoal Shell Script programming example

Dim ShoalShell As Microsoft.VisualBasic.Scripting.ShoalShell.Runtime.Objects.ShellScript = New Scripting.ShoalShell.Runtime.Objects.ShellScript()

Call ShoalShell.InstallModules(GetType(RDotNET.Extensions.ShellScriptAPI.Serialization).Assembly.Location)
Call ShoalShell.InstallModules(GetType(Wavelets).Assembly.Location)
Call ShoalShell.InstallModules(GetType(ShoalShell.PlugIns.Plot_Devices.DataSource).Assembly.Location)

Call ShoalShell.TypeLibraryRegistry.Save()

Dim Script As String =
<ShoalShell-Script>

imports wavelets
imports r.net
imports io_device.csv
imports system

chipdata &lt; (imports.csv) ../DM_1184.GeneChipDataSamples.csv
chipdata &lt;- $chipdata -> as.datasource
chipdata &lt;= $chipdata [0]
chipdata &lt;- $chipdata -> get.X

s4obj &lt;- $chipdata -> dwt.r.invoke filter haar n.levels 5
result.type &lt;- wavelets result.type.schema
result &lt;- ctype r.data $s4obj cast.type $result.type

call $result > ./Test.Result.ShoalInvoke.xml

return $result
</ShoalShell-Script>

Dim bResult = ShoalShell <= Script  'Execute the script and gets the return value
MsgBox(DirectCast(bResult, Wavelets.Waveletmodwt).GetXml, MsgBoxStyle.Information)

First we instancelize a shoal shell scripting host in our code and then install the required module DLL file

Dim ShoalShell As Microsoft.VisualBasic.Scripting.ShoalShell.Runtime.Objects.ShellScript = New Scripting.ShoalShell.Runtime.Objects.ShellScript()

For install the external dynamics API module DLL file, you can use

Call ShoalShell.InstallModules("<DLL_filepath>")

Example as

Call ShoalShell.InstallModules(GetType(RDotNET.Extensions.ShellScriptAPI.Serialization).Assembly.Location)

Then we start to script and get the return result from

# Shoal shell statement
return $result

     

' VB code gets the result from the shoal shell returns value
Dim bResult = ShoalShell <= Script  'Execute the script and gets the return value

      

3. Dynamics programming with shoal shell

The shoal shell is also has the dynamics programming feature with your .NET program

' Shoal Shell VB/C# dynamics programming example

Dim Dynamics As Object = New Microsoft.VisualBasic.Scripting.ShoalShell.Runtime.Objects.Dynamics(ShoalShell)
'  ---------------------------Translate version of the shell shell scripting show above---------------------------------

Dim ChipDataDy = Dynamics.Imports.Csv("../DM_1184.GeneChipDataSamples.csv")
ChipDataDy = Dynamics.As.DataSource(ChipDataDy)
ChipDataDy = ChipDataDy(0)
ChipDataDy = Dynamics.Get.X(ChipDataDy)

Dim s4obj = Dynamics.dwt.r.invoke(ChipDataDy)
Result = DirectCast(Dynamics.CType(s4obj, GetType(Wavelets.Waveletmodwt)), Wavelets.Waveletmodwt)

'  ---------------------------------------------------------------------------------------------------------------------

Result.GetXml.SaveTo("./Test.Result.ShoalInvoke.Dynamics.Programming.xml")
MsgBox(DirectCast(Result, Wavelets.Waveletmodwt).GetXml, MsgBoxStyle.Information)      

As you can see, the dynamics code show above is the VB translate version of the shoal shell scripting! Things are amazing!