SemWeb is a .NET library for working with Resource Description Framework (RDF) data. It provides classes for reading, writing, manipulating, and querying RDF. The library does not (yet) provide any special tools for OWL ontologies. That is, the library is a general-purpose framework for dealing with RDF statements (i.e. triples).
The primary classes in the library are in the SemWeb namespace. In that namespace, four classes provide the fundamentals for all aspects of the library: Resource, Statement, StatementSource, and StatementSink.
The MemoryStore class and the SelectableSource interface are also discussed below.
Resource is the abstract base class of the terms in RDF. The RDF formal model has two types of terms, nodes and literal values, and likewise Resource has two subclasses: Entity and Literal. Nodes, whether they be named (i.e. URIs) or blank (i.e. anonymous), are represented by the Entity class. Named nodes are represented by Entity objects directly, while blank nodes are represented by the BNode class, which is a subclass of Entity. Literal values are represented by the Literal class.
A RDF triple is represented by the Statement struct. This type is a struct, rather than a class, because it is the number of statements that an application will have to process that is the most likely to be subject to scalability concerns. That is, an RDF database can contain billions of triples, but usually won't have nearly so many unique resources. The Statement struct has three main fields: Subject, Predicate, amd Object. Following the RDF specification, the subject and predicate of a triple cannot contain literal values, and thus those fields are typed as Entity, while an object can be a triple, so the object field is typed as Resource. It is thus often necessary to cast values when processing statement objects to get an Entity or Literal value back from a statement.
To construct a triple, use Statement's three-arg constructor:
new Statement( new Entity("http://www.example.org/SemWeb"), new Entity("http://www.example.org/hasName"), new Literal("My Semantic Web library!") );
Note that constructing URI nodes involves calling Entity's constructor and passing a string. (Validation that the string is a legitimate URI is not performed. That is the responsibility of the caller.) To construct blank nodes, you must instantiate the BNode class:
new Statement( new Entity("http://www.example.org/SemWeb"), new Entity("http://www.example.org/relatedTo"), new BNode() );
When adding statements into a store, no fields of a Statement may be null.
While all instances of an Entity constructed by passing a URI are considered equal so long as their URIs are equal, no two BNode object instances are considered equal (mostly). You must create a single BNode instance with "new BNode()" and use that instance throughout to refer to the same BNode at different times.
It is possible to create literal values with a language tag or datatype. To do so, use the three-arg Literal constructor:
new Literal("My Semantic Web library!", "en_US", null); new Literal("1234", null, XSDNS + "#integer");
A literal value may not have both a language and a datatype, following the RDF spec.
Statements actually have a fourth field, called Meta, which may be used for any purpose. It is envisioned to attach context information to a statement. The Meta field must be an Entity.
The Resource class hierarchy has a fourth subclass. Variable is a subclass of BNode and represents a variable in a query. It is meant to be used only in queries.
Places where you get statements from are StatementSources. This is an interface that has a method called Select whose purpose is to stream some statements to an object (a StatementSink) that is equipt via a method called Add to receive those statements. The approach taken here is an alternative to using the iterator paradigm for scanning through a set of statements. Rather, it is a source/sink type of paradigm, if such a thing exists.
Let's start with the StatementSink. If you want to process a set of statements, you will need to write a class that implements this interface, by adding the method:
public bool Add(Statement statement) { ... }
You could create a private nested class to implement the interface, for instance. Inside this method, you place your code to process a single statement. If you need to see more than one statement at once, you could place code in there to put the statement into an ArrayList, and then later on process all of the statements in the ArrayList, for instance. (Or just use a MemoryStore in the first place: more on that later.) Return true at the end, or false to signal the caller to stop sending statements.
StatementSink is implemented by the file-writing classes, including the RDF/XML writer (RdfXmlWriter) and N3 writer (N3Writer).
The StatementSource, on the other hand, is the object with the statements that you want to access. File-reading classes like the RDF/XML and N3 readers (RdfXmlReader and N3Reader) implement this interface. It contains a method Select to which you pass your StatementSink to begin sending the statements from the source into the sink, with the sink's Add method called for each statement. (Select is named after the SQL command of the same name.)
source.Select(new MySink());
The MemoryStore class is an in-memory storage object for statements. At its core, it is just an ArrayList of statements. This class (and actually all Store classes) are peculiar from the point of view of the class hierarchy discussed so far: It implements both StatementSource and StatementSink. Thus you can add statements to it by calling its Add method. You can also get statements out of it by calling its Select(StatementSink) method, which will stream statements into any StatementSink (including another MemoryStore, a file-writing class, or one of your own classes, for instance).
MemoryStore ms1 = new MemoryStore(); ms1.Add(new Statment(.....); MemoryStore ms2 = new MemoryStore(); ms1.Select(ms2);
But the MemoryStore can also be used as a utility class for moving from the source-sink paradigm to an iterator paradigm. The class implements IEnumerable, and in the .NET 2.0 build IEnumerable<Statement>, which means you can foreach over them to iterate through the statements they contain. You need to keep in mind scalability issues here, though. Streaming statements into a MemoryStore means you are loading them all into memory, which may not be possible for large applications. And when using the .NET 1.1 build (no generics), using the iterator paradigm requires boxing and unboxing the Statements.
MemoryStore ms = new MemoryStore(); datasource.Select(ms); foreach (Statement stmt in ms) Console.WriteLine(stmt);
As an alternative to foreach, mainly for the .NET 1.1 build, you can loop as follows, which doesn't require boxing/unboxing:
for (int i = 0; i < ms.StatementCount; ms++) { Statement stmt = ms[i]; Console.WriteLine(stmt); }
The SelectableSource is another part of the core of the library. This interface extends StatementSource with two new Select methods. Recall the Select(StatementSink) method already introduced which streams all statements from the source into the sink. The MemoryStore and other data sources use the SelectableSource interface to provide a basic filtering mechanism on the statements that are streamed back. These methods are:
void Select(Statement template, StatementSink sink); void Select(SelectFilter filter, StatementSink sink);
In the first method, the caller provides a "statement template". Unlike statements added into stores, these statements may have null fields for subject, predicate, or object. Those fields are then treated as wildcards, and the fields that have values are applied as filters. Filtering with new Statement(x, null, null) will stream to the sink only statements whose subject is x. While of course you could filter out the statements in your sink, this wouldn't scale when the data source has billions of other irrelevant triples:
// streams just statements that have mySubject as the subject source.Select(new Statement(mySubject, null, null), new MySink()); // streams just statements that have myPredicate and myObject as the predicate and object source.Select(new Statement(null, myPredicate, myObject), new MySink());
This template paradigm is useful when you want to get the statements that match some other statement. It is also used by the Contains(Statement) method.
The SelectableSource interface's second new Select method takes a SelectFilter object as its first argument. An object of this class provides more control over the statements selected. In particular, it allows for statements in which the subject, predicate, or object can range over a list of values, rather than just a single value as with the template paradigm. Here is an example:
SelectFilter filter = new SelectFiler(); filter.Subjects = new Entity[] { entity1, entity2, entity3 }; filter.Predicates = new Entity[] { dc_title, rdfs_subject }; // streams statements who have any of the listed entities as // the subject, and any of the listed predicates as the object source.Select(filter, new MySink());
The primary advantage of this second Select method is that it is more efficient to query out-of-memory databases as few times as possible, getting as many results in one shot, than repeatedly querying the data source for different triples.