Skip to main content

Hibernate Shards for data partitioning across databases

Google has donated Hibernate Shards - a Java 5 add-on to Hibernate that allows application driven data partitioning with your custom or their pre-build configurations. One thing that caught my eye was that when they covered partitioned Native ID generation (aka Identity), they mentioned having Database A use a range 0-200000, Database B use range 200001 - 400000, etc. I know that MySQL cluster suggests using something like this:

Database A: Starting ID 1, Increment ID by 3 (number of databases)
Database B: Starting ID 2, Increment ID by 3
Database C: Starting ID 3, Increment ID by 3

This allows a hands-off approach, and easily lets you divide ID by 3 and use the remainder to map to a "Shard". Of course, if your data changes significantly, you may have to dump and reload to add another database. One way to avoid a dump and reload is to pick an increment that is larger than your actual database count, and just drop an extra database in an the available Starting ID. This can eat up your "keyspace" faster, but if you don't have a huge amount of data, you don't need partitioning that bad anyway.

If you have been reading my Blog, you know that I use NHibernate, but I am confident that the techniques that they are using are portable to .Net. I can see some value in using this for year-based partitioning, where archives are made available as read-only data, and the new database is created with the next available ID.

Comments

Max said…
You make an excellent point Roy, and if you've read the section on virtual shards you saw that in order to use this feature we require you to determine the maximum number of physical shards your app will ever require up front. We really do think this is doable for applications that require sharding, since there's no penalty for overshooting. In the technique you describe there is, as you said, a penalty for overshooting (your keyspace gets eaten into), but if this is acceptable your technique sounds great.

Max (Hibernate Shards developer)

Popular posts from this blog

Castle ActiveRecord with DetachedCriteria

My current development environment is Visual Studio Express C# Edition (read that as free ), Castle ActiveRecord's latest svn trunk(usually within a few days), and NHibernate svn trunk. As of NHibernate version 1.2.0, there is a very cool new class out there ... DetachedCriteria. This class lets you set all of your Castle relational attributes like BelongsTo, HasMany, etc. as lazy fetch, and over-ride this for searches, reports, or anytime you know ahead of time that you will be touching the related classes by calling detachedCriteria.SetFetchMode(..., FetchEnum.Eager). As a good netizen, I have tried to contribute to NHibernate and Castle ActiveRecord even if only in the smallest of ways . Oh yeah, I tried mapping to a SQL VIEW, and it worked GREAT! I received a comment after my last post, indicating that there is a better way, and I am sure of it, but the view guaranteed that I only have one database request for my dataset. NHibernate was wanting to re-fetch my missing as

Castle ActiveRecord calling a Stored Procedure

Update: I have contributed patch AR-156 that allows full integration of Insert, Update and Delete to ActiveRecord models . If you've been reading my blog lately, you know that I have been seriously testing the Castle ActiveRecord framework out. I really love it, but I have an existing Microsoft SQL Server database with many stored procedures in it. I have tested the ActiveRecord model out, and I am sure that I will learn enough to be able to use it for standard CRUD (create, read, update, delete aka. insert, select, update, delete) functionality. BUT ... If I really want to integrate with my existing billing procedures, etc, I will have to be able to call stored procedures. I have taken two approaches ... write the ARHelper.ExecuteNonQuery(targetType, dmlString) method that gets a connection for the supplied type, executes dmlString, and closes it. write the ARHelper.RegisterCustomMapping(targetType, xmlString) method that allows me to add mappings that refer to my auto-gener

Castle Active Record Code Generator

I have finally released my Code Generator to Google Code as Active-Record-Gen . What does it generate? It generates ActiveRecord classes mainly, but I have used it to generate stored procedures and sys-admin scripts as well. This code generator does not (yet) generate a full Windows application project or a Mono-Rail web site, but the generated code could be used in either. In fact, with a few tweaks, this could be used to generate NHibernate "poco" and .xbm files. If you want to know more, look at the screen shots above, or head over to Google Code and run it. In my haste to make my first EXE release before supper, I forgot to add the Template directory, which should be at the same directory level as the EXE and config files. I just (1.5 hours later) uploaded a new EXE, but 2 people have already downloaded the EXE (not the source though). As for the basic table object, it is built with the following assumptions: Table name is plural, class name is singular. Field &quo