Simias Data Model Proposal

From iFolder

This document is a proposal for the Simias Data Model. If you have comments about the proposal, please send them to the ifolder-dev (mailto:ifolder-dev@forge.novell.com) mailing list.

Table of contents

Introduction

Simias was developed to provide a collaborative base for applications. By storing application data in Simias, the data could be shared and synchronized between multiple people and multiple machines. Developers need a simple way to store and retrieve data from Simias that is cross platform and available from various programming languages. This document will explore a suggested solution for providing a universal API to fit the need of most developers wanting to use Simias as their data/meta-data repository.

Current API

Simias currently has a rich managed interface to search, store, and manipulate data and meta-data. This interface is only available within the Simias process which limits the type of applications that are able to use it. To get around the problem of needing to run in the Simias process space, Simias included the mono teams XSP which provided an environment to build Web Services in the Simias application domain. iFolder took advantage of this solution and built an iFolder Web Service to communicate with Simias. This same approach could be used to build a generic Simias API but due to the static nature of Web Services, building a generic API which allows for schema-less data would present quite a challenge. SOAP and Web Services were not designed to be used as a generic query and store API. Because of this, a new solution will need to be put in place on which the Simias API can be built.

Proposed Simias API - RDF and SPARQL

The proposed solution is to adopt RDF (http://www.w3.org/RDF) and SPARQL (http://www.w3.org/TR/rdf-sparql-query) as a base for the Simias API. RDF is a W3C recommendation while SPARQL is on it's way to becoming a recommendation by the end of the year. Both RDF and SPARQL are used by many systems and well documented. In order to adopt RDF, no changes will be required of the existing Simias infrastructure. Several components will be added to Simias to allow communication via RDF. The intent of these changes is not to transform Simias into an RDF triple store, but to model the Simias data in RDF. The following diagram shows a high level view of the components that would make up the proposed Simias API.

Image:SimiasAPIModel.png

On top of the current Simias Store API, an RDF Translator would be built. The RDF Translator would include an RDF Serializer that would take objects returned from Simias and serialize them to RDF/XML. The RDF Translator would also include a SPARQL module that has specific knowledge of Simias. The SPARQL statements would be translated into Simias searches and executed on the Simias store. The results would be returned in SPARQL Query Results XML Format. More on the specifics of querying, creating, modifying, and deleting data via RDF will be discussed in detail later.

In order to reduce the amount of change needed to implement the new API, the existing web service architecture will be leveraged to exchange RDF data. A new RDF Web Service will be created to provide a way of handling the RDF data. The RDF Web Service would be a minimal web service consisting only of methods needed to send and receive query statements and RDF documents.

Today there are several methods in the Simias Web Service that do not make sense to model in RDF such as authentication. While RDF is rich enough that some mechanism could be invented to facilitate authentication, it doesn't fit the model. The existing Simias Web Service will still be used to provide authentication and other methods that are not part of the data model in Simias.

The existing Simias Event System is a very useful notification system for processes that react to changes that happen in the Simias Store. External solutions like iFolder can connect and register for events to know when to update the UI or notify the user of changes. Today the event registration can only be as granular as the base types in Simias. This level of filtering means that applications that need to be notified of only a small number of events must create their own event filtering system. A Filtered Event Service will be created on top of the current Simias Events to allow very granular event registration. The current Event Thread Service will be leveraged and changed to the Filtered Event Service to provide remote events.

A uniform Simias API will be created that wraps all of the services described above. Developers will only need to go to one Simias API Framework to login, store and retrieve data in Simias, and utilize events. The details of the APIs will not be explored here. At some later time, an object API may be created on top of the Simias API to allow developers simple access to objects store in Simias.

Modeling Simias data in RDF

The first step in using RDF with Simias is to model the Simias data in RDF. We'll start by modeling a very simple call to get all Collections on a new Simias store. The xml returned in simias looks like this:

 <ObjectList>
   <Object name="Local" id="5193adc1-276a-4b0a-9c27-d442d6fe5f4f" type="Domain">
     <Property name="NodeCreate" type="DateTime">632628083564488370</Property>
     <Property name="ClntRev" type="UInt64">1</Property>
     <Property name="Types" type="String">Domain</Property>
     <Property name="Types" type="String">Node</Property>
     <Property name="Types" type="String">Collection</Property>
     <Property name="DomainID" type="String">5193adc1-276a-4b0a-9c27-d442d6fe5f4f</Property>
     <Property name="Description" type="String">Local Machine Domain</Property>
     <Property name="CollectionId" type="String">5193adc1-276a-4b0a-9c27-d442d6fe5f4f</Property>
     <Property name="Creator" type="String">f73180ad-b541-4638-8aba-c0a24efcb7e1</Property>
   </Object>
   <Object name="LocalDatabase" id="77cf310a-ef22-4e5a-8d07-f58fd5f005ee" type="LocalDatabase">
     <Property name="NodeCreate" type="DateTime">632628083554158790</Property>
     <Property name="ClntRev" type="UInt64">2</Property>
     <Property name="Types" type="String">LocalDatabase</Property>
     <Property name="Types" type="String">Node</Property>
     <Property name="Types" type="String">Collection</Property>
     <Property name="DomainID" type="String">5193adc1-276a-4b0a-9c27-d442d6fe5f4f</Property>
     <Property name="Version" type="String">1.0.1</Property>
     <Property name="CollectionId" type="String">77cf310a-ef22-4e5a-8d07-f58fd5f005ee</Property>
     <Property name="Creator" type="String">f73180ad-b541-4638-8aba-c0a24efcb7e1</Property>
     <Property name="LocalPwd" type="String">35088159-9d75-42a3-9f96-ffd9628f1bc9</Property>
   </Object>
 </ObjectList>

If we were to simply convert this XML to RDF/XML and place the properties in a simias namespace it might look like this:

 <rdf:RDF	xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
               xmlns:simias="http://www.ifolder.com/">
   <rdf:Description rdf:about="simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f">
     <simias:NodeCreate rdf:datatype="&xsd;dateTime">632628083564488370</simias:NodeCreate>
     <simias:ClntRev rdf:datatype="&xsd;integer">1</simias:ClntRev>
     <simias:Types rdf:datatype="&xsd;string">Domain</simias:Types>
     <simias:Types rdf:datatype="&xsd;string">Node</simias:Types>
     <simias:Types rdf:datatype="&xsd;string">Collection</simias:Types>
     <simias:DomainID rdf:datatype="&xsd;string">5193adc1-276a-4b0a-9c27-d442d6fe5f4f</simias:DomainID>
     <simias:Description rdf:datatype="&xsd;string">Local Machine Domain</simias:Description>
     <simias:CollectionId rdf:datatype="&xsd;string">5193adc1-276a-4b0a-9c27-d442d6fe5f4f</simias:CollectionId>
     <simias:Creator rdf:datatype="&xsd;string">f73180ad-b541-4638-8aba-c0a24efcb7e1</simias:Creator>
   </rdf:Description>
   <rdf:Description rdf:about="simias:77cf310a-ef22-4e5a-8d07-f58fd5f005ee">
     <simias:NodeCreate rdf:datatype="&xsd;dateTime">632628083554158790</simias:NodeCreate>
     <simias:ClntRev rdf:datatype="&xsd;integer">2</simias:ClntRev>
     <simias:Types rdf:datatype="&xsd;string">LocalDatabase</simias:Types>
     <simias:Types rdf:datatype="&xsd;string">Node</simias:Types>
     <simias:Types rdf:datatype="&xsd;string">Collection</simias:Types>
     <simias:DomainID rdf:datatype="&xsd;string">5193adc1-276a-4b0a-9c27-d442d6fe5f4f</simias:DomainID>
     <simias:Version rdf:datatype="&xsd;string">1.0.1</simias:Version>
     <simias:CollectionId rdf:datatype="&xsd;string">77cf310a-ef22-4e5a-8d07-f58fd5f005ee</simias:CollectionId>
     <simias:Creator rdf:datatype="&xsd;string">f73180ad-b541-4638-8aba-c0a24efcb7e1</simias:Creator>
     <simias:LocalPwd rdf:datatype="&xsd;string">35088159-9d75-42a3-9f96-ffd9628f1bc9</simias:LocalPwd>
   </simias:Description>
 </rdf:RDF>

The RDF graph would look like this:

Image:SimiasRDF-1.png

Although the RDF/XML may look like a good fit for Simias, the graph shows that all relationships that exist in Simias are not being expressed. In an RDF graph, circles represent resources (objects) with their URIs and squares represent literal values. For a mapping of Simias to RDF the nodes are identified by the URI <simias:GUID>. By using the mapping shown, a developer would eventually learn that some of the properties on the nodes represent literal values and others should be interpreted as relationships. An example of such an interpretation is the property DomainID. The DomainID is actually a reference to the Domain that a node belongs to. By looking at the Simias/XML and our RDF/XML representation of the nodes, this relationship is not expressed. The RDF map also doesn't express any kind of relationship between the node and a Domain. A more correct RDF map of these nodes would look like this:

Image:SimiasRDF-2.png

From this new RDF graph, it is obvious that the DomainID, CollectionId, and Creator properties all refer to other Resources (objects). We can also add rdf types and define the types of the objects defined by the URI. Adding the types, relationships to the RDF/XML results in a document that looks like this:

 <rdf:RDF  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
           xmlns:simias="http://www.ifolder.com/">
   <simias:Collection rdf:about="simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f">
     <simias:NodeCreate rdf:datatype="xsd:dateTime">632628083564488370</simias:NodeCreate>
     <simias:ClntRev rdf:datatype="xsd:integer">1</simias:ClntRev>
     <simias:Types rdf:datatype="xsd:string">Domain</simias:Types>
     <simias:Types rdf:datatype="xsd:string">Node</simias:Types>
     <simias:Types rdf:datatype="xsd:string">Collection</simias:Types>
     <simias:Description rdf:datatype="xsd:string">Local Machine Domain</simias:Description>
     <simias:DomainID>
       <simias:Domain rdf:about="simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f"/>
     </simias:DomainID>
     <simias:CollectionId>
       <simias:Collection rdf:about="simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f"/>
     </simias:CollectionId>
     <simias:Creator>
       <simias:Member rdf:about="simias:f73180ad-b541-4638-8aba-c0a24efcb7e1"/>
     </simias:Creator>
   </simias:Collection>
   <simias:Collection rdf:about="simias:77cf310a-ef22-4e5a-8d07-f58fd5f005ee">
     <simias:NodeCreate rdf:datatype="xsd:dateTime">632628083554158790</simias:NodeCreate>
     <simias:ClntRev rdf:datatype="xsd:integer">2</simias:ClntRev>
     <simias:Types rdf:datatype="xsd:string">LocalDatabase</simias:Types>
     <simias:Types rdf:datatype="xsd:string">Node</simias:Types>
     <simias:Types rdf:datatype="xsd:string">Collection</simias:Types>
     <simias:Description rdf:datatype="xsd:string">Local Machine Domain</simias:Description>
     <simias:DomainID>
       <simias:Domain rdf:about="simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f"/>
     </simias:DomainID>
     <simias:CollectionId>
       <simias:Collection rdf:about="simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f"/>
     </simias:CollectionId>
     <simias:Creator>
       <simias:Member rdf:about="simias:f73180ad-b541-4638-8aba-c0a24efcb7e1"/>
     </simias:Creator>
   </simias:Collection>
 </rdf:RDF>

Although not discussed in this document, the property names for simias:DomainID and simias:CollectionId don't express any relationship well. It may be worth investigating appropriate names for these properties when expressed in RDF.

Querying data in Simias

When adopting RDF as the model for data in Simias, it follows to also adopt SPARQL to perform all queries in Simias. There are three standards in the works to be considered:

SPARQL Query Language for RDF (http://www.w3.org/TR/rdf-sparql-query/)

Query Results XML Format (http://www.w3.org/2001/sw/DataAccess/rf1/SPARQL)

SPARQL Protocol for RDF (http://www.w3.org/TR/2005/WD-rdf-sparql-protocol-20050914/)

The first two are the most important as they are about to be approved as W3C recommendations. Using the first two standards, all queries into Simias would be done using the SPARQL syntax. The results of that query would be returned using the SPARQL Query Results XML Format. The last standard is not as close to being adopted, but may prove important as it defines a protocol for sending and receiving SPARQL. One of the specifics of this protocol is using SOAP/HTTP. For Simias, a very similar protocol will be used which is a simple query method that takes a string and returns a string. The sent string will contain the SPARQL statement and the returning string will contain the SPARQL results.

The following is an example of what a SPARQL query and the results would look like. The query in this example will ask the following:

 Properties to be returned: 
   URI  (same as the GUID but RDF uses URIs for everything)
   name
   owner
 Parameters to search for:
   rdf:type is Collection
   simias:Types has iFolder
   Is related to simias:Domain with with a URI of "simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f"

The resulting SPARQL statement would look like this:

 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX simias: <http://www.ifolder.com>
 SELECT $name $uri $owner
 WHERE
 {
   ?item	rdf:type			simias:Collection .
   ?item	simias:DomainID	<simias:5193adc1-276a-4b0a-9c27-d442d6fe5f4f> .
   ?item	simias:Types		"iFolder" .
   ?item	simias:Creator		$owner .
   ?item	simias:Description	$name
   ?item	simias:CollectionId	$uri
 }

The results returned might look like this:

 <sparql xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.w3.org/2001/sw/DataAccess/rf1/result2" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
   <head>
     <variable name="name"/>
     <variable name="owner"/>
     <variable name="uri"/>
   </head>
   <results>
     <result>
       <binding name="name"><literal datatype="xsd:string">Work stuff</literal></binding>
       <binding name="owner"><uri>simias:f73180ad-b541-4638-8aba-c0a24efcb7e1</uri></binding>
       <binding name="uri"><uri>simias:l73180ad-b541-4638-8aba-c0a24efcb7e1</uri></binding>
     </result>
     <result>
       <binding name="name"><literal datatype="xsd:string">Calvin's photos</literal></binding>
       <binding name="owner"><uri>simias:f73180ad-b541-4638-8aba-c0a24efcb7e1</uri></binding>
       <binding name="uri"><uri>simias:a73180ad-b541-4638-8aba-c0a24efcb7e1</uri></binding>
     </result>
   </results>
 </sparql>

The query abilities are quite extensive so a full discussion of what can be queried will not be covered here. It is recommended that a subset of all query capabilities be chosen to match those that are available in Simias.

Modifying data in Simias

There is no W3C standard for modifying RDF graphs in the store. A proposed solution could simply be to create an RDF/XML document and pass it to a method to modify the triples represented in the document. All operations would be available on multiple objects in Simias by specifying multiple resources in the RDF/XML. The Simias API could provide four methods that handle RDF/XML:

Create: This would create all of the triples specified and assign URIs to the resources (iFolders, Collections, etc.)

Update: This would add values to multi-value properties and replace single value properties on resources

Store: This would replace value specified and remove values not specified

Delete: An empty resource would remove the resource, a resource with properties would remove the properties

Initially, only the RDF/XML representation of RDF graphs would be supported but it may be desirable to add support for other RDF serialized formats.