SemWeb Documentation

SemWeb: Using Databases

Besides the MemoryStore, SemWeb provides four SQL-backed persistent stores: for MySQL, PostgreSQL, SQL Server, and Sqlite. Since Sqlite is the simplest to get going, I'll use it as an example.

Overview of database storage

Just as a brief note, the MySQL store is several times faster than the Sqlite store, at least for adding statements into the store. Sqlite is easier to work with, so it's appropriate for small stores of data maybe up to 100,000 statements. Using the Import method of the MySQL store, the store can load over ten million statements in as little as 45 minutes on modest hardware, which is about 4,500 statements per second. The Sqlite store loads the same graph at 1,400 statements per second. The speed depends a lot on the structure of data being loaded. I haven't tested the PostgreSQL store myself, so I don't know how fast it is.

The SQL stores keep the RDF data in three tables, which have a common prefix provided by you. The main table is the prefix_statements table, which has a row for each statement in the store. The columns are numeric IDs for the subject, predicate, object, and meta. The IDs are linked to URIs, for entities, and values, for Literals, in the _entities and _literals tables.

The _statements table is indexed many ways to make the performance of Select fast. In the MySQL store, the table has a UNIQUE index over all of the columns to ensure that an assertion can be added only once. This speeds up operations later on when they don't have to check for duplicate statements and doesn't hurt the performance of adding statements significantly.

Getting set up

First, prepare your environment for using Sqlite. To do that, download SQLite and make it available for your program at run time. In Windows, put the dll into your System32 directoy. In Unix, install the RPM, or put the so in /usr/local/lib or else make sure the LD_LIBRARY_PATH environment variable contains a path containing the Sqlite library.

Second, make sure you have the Mono.Data.SqliteClient data adapter from Mono. If you're using Unix, be sure you have the mono-data-sqlite package installed. If you're using Windows and don't have the adapter already, you can get it out of the latest daily MonoCharge (either put it in the GAC, or put it along side the other SemWeb DLLs).

Last, download a RDF file to your hard disk, such as: http://www.mozilla.org/news.rdf.

Moving data into a database

We want to get going quickly, so we'll use SemWeb's rdfstorage.exe tool to move some data into a Sqlite database. Go to a command prompt in the bin directory of the SemWeb package. Then run:

$ mono rdfstorage.exe news.rdf --out "sqlite:rdf:Uri=file:news.sqlite;version=3"
(In Windows, you can leave off the 'mono' to use the Microsoft runtime.) You'll get the following output (or something like it):
news.rdf  0m0s, 81 statements, 597 st/sec
Total Time: 0m0s, 81 statements, 505 st/sec

This command read the RDF/XML file and created a Sqlite version 3 database named news.sqlite.

The "out" argument is what I call a "spec" string. It tells SemWeb how to open up a storage mechanism. This spec string has three parts. The first part is "sqlite" which tells SemWeb to use the Sqlite storage engine. The second part is "rdf" which is the prefix for the tables to use in the database. The third part, "Uri=file:news.sqlite;version=3", is a connection string used by Mono.Data.SqliteClient. (For more info on the connection string, see the Mono website.)

If you omit the "out" argument entirely, the program will dump the file to the console in Turtle format.

Getting data out of the database

To query the database, we'll write a program.

Create a store backed by the database using the following method:

Store store = Store.Create("sqlite:rdf:Uri=file:news.sqlite;version=3");

The argument to Create is the same "spec" string as before. This method can be used to create various types of data sources in the factory design pattern.

The actual implementation of the Sqlite store is in the SemWeb.SqliteStore.dll assembly, but by using Create you don't need to reference that assembly at compile time. Just make sure it's available at run time.

Now you can select statements from it just as before. It will be a little slower than the memory-backed store, but you haven't loaded the entire store into memory.

You can also add statements into the store, just as with the memory store.

Be sure to close the database at the end with Dispose.

Here's a complete program:

using System;
using SemWeb;

public class Sqlite {
    const string RDF = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    const string FOAF = "http://xmlns.com/foaf/0.1/";
    
    static readonly Entity rdftype = RDF+"type";
    static readonly Entity foafPerson = FOAF+"Person";
    static readonly Entity foafname = FOAF+"name";

    public static void Main() {
        Store store = Store.Create("sqlite:rdf:Uri=file:foaf.sqlite;version=3");
        
        Entity newPerson = new Entity("http://www.example.org/me");
        store.Add(new Statement(newPerson, rdftype, foafPerson));
        store.Add(new Statement(newPerson, foafname, (Literal)"New Guy"));
        
        Console.WriteLine("These are the people in the file:");
        foreach (Entity s in store.SelectSubjects(rdftype, foafPerson)) {
            foreach (Resource r in store.SelectObjects(s, foafname))
                Console.WriteLine(r);
        }
        Console.WriteLine();
        
        store.Dispose();
    }
}

Other Database Notes

Besides Sqlite, MySQL, PostgreSQL, and SQL Server are also supported. To use those databases, just replace the “spec” string ("sqlite:rdf:...") with something like:

"mysql:rdf:Database=rdf; . . ."
"postgresql:rdf:Database=rdf; . . ."
"sqlserver:rdf:Database=rdf; . . ."

Everything after the second colon is a MySQL connection string to make the connection. You'll have to consult the MySQL/Connector.Net documentation for more information about the connection string.

For the SQL stores, the Import() method is optimized so that statements are added to the tables in batch, where possible. Further, the MySQL store can wrap the Import SQL statements with SQL commands that make the Import faster. By default, the Import is wrapped in a BEGIN/COMMIT transaction. Set the SEMWEB_MYSQL_IMPORT_MODE environment variable (or hack the source code) to change the behavior. Setting this environment variable to LOCK will use LOCK TABLES to create a write lock on the statements, entities, and literals tables for the store, which improves performance significantly. Setting the environment variable to DISABLEKEYS uses ALTER TABLE DISABLE KEYS on the statements table, which improves performance while it adds statements, but causes a lengthy rebuilding of the index at the end of the import.