Thursday, June 23, 2011

Bolsena #3: INSPIRE on the GeoCouch

In case anyone ever wondered what good it can do to put a couple of hackers in an Italian monastery for a week, here's an example. Talking to Volker Mische (author of GeoCouch) we've wondered whether the BLOB mode approach for traditional databases could not be applied to the couch as well. Rather than wondering we thought we might just as well try it out.

After an evening spent hacking we've finally proved that it can work, and created an initial version of a deegree-featurestore-geocouch. It creates spatial indexes automatically upon startup, and has the ability to insert features via WFS-T or the loader. Querying by ID or BBOX also works already.

We've tested it with INSPIRE Annex I data themes (Addresses and CadastralParcels). That's what the code sprint is all about, collaborating, coming up with new ideas and combining things in a new way.

Mind you, the store is not ready for production or anything, but it's a start, and only needs a little more care to be a viable alternative to traditional databases.

Bolsena #2: PostgreSQL news

Following up on the last post, I've implemented a small patch against PostgreSQL-JDBC, and sent it to on the PostgreSQL JDBC mailing list. Oliver Jowett then kindly wrote a small benchmark to test my changes, and it seems indeed to be quite a bit faster than the original version.

That said, I've also asked about using a binary protocol to connect to the database, and there has indeed been work on that. So we can hope to get a faster version sometime in the future, which does not require special handling of bytea fields any more.

Monday, June 20, 2011

Bolsena #1: Blogging deegree

There has been a considerable lack of blogging about deegree. Googling for it yields the results for 'degree blog' (note the missing e), and forcing the issue reveals only a blob that seems only suitable for adults. Being in Bolsena under the hot Italian sun participating at the Bolsena Code Sprint 2011 seems like a good spot to change that.

So what's happening in the deegree world? Just now we're profiling our INSPIRE services. Turns out that the PostgreSQL JDBC driver has some strange handling concerning bytea fields. We're using these in our so called BLOB storage (storing GML directly in the database with a couple of indexes). In theory fetching the BLOBs (usually only a couple of KB) should be fast enough, and the actual GML parsing/exporting/rendering etc. should slow things down eventually.

The nice thing about actually profiling things is that you know where you can replace the 'should' with 'does not'. For PostgreSQL 9 there seem to be two 'encodings' to fetch the actual bytes of the bytea field. Both include a string representation of the byte in question, one using an octal number, one using a hex number (this is a new 'feature' in PostgreSQL 9). Decoding the bytes involves a method call for each and every byte, where the string is decoded into the actual byte value.

Our test case was ~23000 features sized 1-2KB each. This results in something like 30 to 40 million calls to the method that decodes the bytes, and consumes a lot of time.

So what are the options? There are other options to store large objects in PostgreSQL, so maybe using one of these might be a better option. But since PostgreSQL is Open Source, one might also try to have a closer look at the driver.

Another option would be to try and enjoy the beautiful look at the Lago di Bolsena more often, and not dig into other peoples code...

For those people who want to know more about deegree, have a look at our wiki. Posts which are more concerned about deegree will follow.

Stay tuned for other Bolsena Code Sprint stories!