Tuesday, September 17, 2013

A personal matter

Many of you already know this, but I wanted to make an 'official' statement on me going into part time retirement regarding deegree.

For personal reasons, I had to give up being self employed. Fortunately, I quickly found a new home at the WhereGroup, where I'll be working (apart from other things) on Mapbender.

Besides switching sides from server to client that also means I'll be switching from deegree to Mapbender. That doesn't necessarily mean that I'll never use or develop something for deegree ever again. But it for sure means that I won't make new 100+ commits pull requests any time soon.

Since I'll only be doing minor contributions from now on, it doesn't make any sense for me to remain a member of the TMC, so I'll step back from that position.

I sincerely wish things had worked out differently. With over 40 percent of the commits it almost feels like my third child. I wish you all the best of luck!

See you around (perhaps at the FOSSGIS code sprint in Essen?)!

Friday, June 7, 2013

Bolsena #4 - More JAXB fun

Adding to the JAXB fun I had yesterday, I've decided to fix related problems when binding our schemas at compile time, where the maven-jaxb2-plugin tried (sometimes) to load schemas from the net, although a XML catalog file was provided and configured.

The first thing I noticed was that only some of the related modules had that problem. Very strange, so I ran maven in debug mode (-X) for a module where it worked properly, and one module where it failed.

The plugin was configured identically in both cases (we manage plugins from our parent pom.xml). The only difference I've noticed was that in the working module all project dependencies were passed to the xjc call, whereas in the other module only the plugin's own dependencies were added. So obviously the schemas could not be loaded from the classpath.

I was not able to find out a reason for this, but just adding the project dependencies as plugin dependencies fortunately fixed the problem. This means that apart from the unit tests deegree can now also be compiled offline without problems.

If you want to know the details on xjc + catalogs, read this guide. It might be a little out of date, but most of it is still valid.

Thursday, June 6, 2013

Bolsena #3 - JAXB fun

In deegree we make heavy use of JAXB for unmarshalling configuration files. That works quite well, but had a drawback when making use of schema inclusion. The included schemas were always loaded from the internet.

Using the JAXB SchemaFactory I thought it was pretty easy to work around loading schemas from the net and using the ones included in our .jars instead. But somehow that didn't work out, the base schemas were still loaded through the internet.

Smart people found out that the order of the schema URLs when giving them to the SchemaFactory plays a role, and it turns out to be true! I've just opened a pull request that fixes the problem in deegree.

For it to work the schemas need to be in reverse order of inclusion, so put the schema without dependencies first and so on.

Bolsena #2 - coordinate system ramblings

Time for an update, even though there's not much to discuss on actual progress.

First for some good news, the resource dependencies pull request is finally here! It would be great if many of you would test it out, with 189 commits it's the biggest change since our move to GitHub yet, and although the unit and integration tests are working some details might still need some tuning.

Since my last post, I've been thinking and experimenting with the coordinate subsystem in deegree. The API is currently heavily based on the GML representation of coordinate systems, with references everywhere, for  axis, datums, ellipsoids and so on.

While I'm a friend of models that are 'complete', I'm not sure whether that's the best approach for coordinate systems. There are two major use cases for the CRS package. One is to keep the information on what system is being used with what identifier, the other is to transform coordinates from one system to another. Other use cases would be to import/export coordinate system definitions from/to GML, WKT, proj4 etc.

A review of the code revealed that all identifiers are stored in lower case. So exporting to GML or finding out exactly what identifiers exist for a given system is impossible, because the proper identifiers do not exist. The convenient use case of having eg. your layer configuration set up with the correct CRS identifier from the datastore also becomes impossible sometimes, in case the underlying data store is not configured explicitly with a CRS identifier.

So to summarize, the current package has several shortcomings. First is the identifier mess, which is not easily fixed short of a complete reimport of CRS definitions (which would override some manually modified definitions). Second is the model, which is just too complex and makes it hard to compare two definitions. Third, transformations are slow, sometimes not thread safe, sometimes they are synchronized and thus not scalable on multicore machines. Four, the whole system is statically initialized and makes heavy use of global static state.

I've tried unsuccessfully to fix some of these issues the past days, but I fear a complete rewrite is the only thing that will do the trick.

Monday, June 3, 2013

Bolsena 2013 #1

It's that time of year again, where OSGeo hackers from around the globe meet in a former monastery to collaborate and code under the Italian sun.

Things are a little different this year. Looking at the past years we've got a record number of people attending this time! Also, sadly, the Italian sun is missing. I hope that people are right when they tell me it's going to get better...

So back to business. I'm in the process of completing the resource dependencies branch/pull request, I can probably create the pull request today or tomorrow. Things are looking good, the web console is already adapted, no tests are failing and it works.

The Mapbender people have installed deegree on their computers and are working on integrating a simple workflow to create a new WMS/layer based on a shape file in a remote deegree instance using our REST API. That's already working, and we're currently trying to get it running in a more user friendly way. Not a bad start!

In a related note, I've created a new pull request that adds some more features to the REST API, like querying for all supported coordinate systems, checking if a coordinate system is supported by deegree and an experimental call to retrieve all known identifiers for a WKT encoded coordinate system.

In theory the equality relation is defined on coordinate systems (not taking identifiers into account obviously), but in practice I was not able to compare the Utah UTM zone (EPSG:26912) to a WKT encoded variant. I guess that's another reason why a rewrite/cleanup of the CRS package is needed.

Stay tuned for more!

Wednesday, May 22, 2013

Hans Moleman issues

XML, especially GML schema validation can be hard. The mysterious Xerces 'honor all schema locations' flag springs to mind (this is a mystery yet to be fully understood). Often, slow schema validation processes (which seem to fetch schemas from the web) can be traced to Hans Moleman. No, sorry, wrong link, to Hans Moleman.

So what's happening? And what does Hans Moleman have to do with it?

As the GML experts among you may know, GML application schemas depend on the GML schema, which in turn consists of many (varies amongst versions) schemas, depending on other schemas like for example the W3C XLinks schema, which in turn includes the W3C XML schema (the schema for the xml namespace itself: http://www.w3.org/XML/1998/namespace).

So even when validating a feature collection against a local version of a GML application schema, the schema parser might still get to a point where it needs to fetch dependent schemas from the internet. And since the xml.xsd is the last one in the chain, it's also the one that gets requested the most.

According to W3C people, they had ~130 million accesses to this file per day, and since decided to completely block eg. the Java default HTTP UserAgent and others. Apparently they later had a change of heart, and don't block it any more, but the xml.xsd URL has a delay of several seconds upon loading (see http://www.w3.org/2001/xml.xsd).

So when validating multiple documents, which all need the xml.xsd, with all schemas loaded freshly every time, you'll get a delay of several seconds where your computer seems to do nothing at all.

We've thought about the problem of remote schemas quite a while ago, and made use of a custom Xerces entity resolver to load OGC and W3C schemas from a local artifact which we ship with deegree. There would also be other solutions, our JAXB schema generation for example makes use of standard XML catalog files to avoid fetching schemas from the web.

But unfortunately the CITE WFS 1.0.0 tests (and others) do not (although newer versions tend to load required schemas from the classpath as well).

Using reverse engineering using an eclipse plugin (see the other post from today) I was able to fix this (they were already using a custom entity resolver, loading everything from the web all the time). Now a complete deegree build including integration tests runs only needs 13 minutes on fast machine!

For those interested, have a look at our deegree-compliance-tests module.

All the library sources

One of the nicer features in eclipse is that it is able to automatically browse not only through your own sources, but through library sources as well, if they're available. But what if they're not?

In that case eclipse shows the class in a byte code view, with a button to attach a source .jar. Unfortunately it  is often the case that you don't have the sources, either because they were not uploaded to maven central, or because they're closed altogether.

In any case, it is obviously often desirable while debugging to see what a library function actually expects, or why it fails. Contrary to popular belief, the actual code is always the ultimate documentation, because it's up to date even if human language docs are not.

So recently, while chasing after a Hans Moleman issue (more on that later), I needed sources from binary classes. Of course my first thought was 'decompiler', so I searched the eclipse marketplace for 'decompiler'.

I found JadClipse for eclipse 4.x, installed it, and voila: when I double click on a class file with no sources attached, the decompiler automatically decompiles it and shows me the source. Now that's what I call a plugin without hassle! That's a perfect counterpart to the -DdownloadSources flag for the maven eclipse mojo, now I never need to go without a library source again.