Skip to main content

Hibernate Shards for data partitioning across databases

Google has donated Hibernate Shards - a Java 5 add-on to Hibernate that allows application driven data partitioning with your custom or their pre-build configurations. One thing that caught my eye was that when they covered partitioned Native ID generation (aka Identity), they mentioned having Database A use a range 0-200000, Database B use range 200001 - 400000, etc. I know that MySQL cluster suggests using something like this:

Database A: Starting ID 1, Increment ID by 3 (number of databases)
Database B: Starting ID 2, Increment ID by 3
Database C: Starting ID 3, Increment ID by 3

This allows a hands-off approach, and easily lets you divide ID by 3 and use the remainder to map to a "Shard". Of course, if your data changes significantly, you may have to dump and reload to add another database. One way to avoid a dump and reload is to pick an increment that is larger than your actual database count, and just drop an extra database in an the available Starting ID. This can eat up your "keyspace" faster, but if you don't have a huge amount of data, you don't need partitioning that bad anyway.

If you have been reading my Blog, you know that I use NHibernate, but I am confident that the techniques that they are using are portable to .Net. I can see some value in using this for year-based partitioning, where archives are made available as read-only data, and the new database is created with the next available ID.

Comments

Max said…
You make an excellent point Roy, and if you've read the section on virtual shards you saw that in order to use this feature we require you to determine the maximum number of physical shards your app will ever require up front. We really do think this is doable for applications that require sharding, since there's no penalty for overshooting. In the technique you describe there is, as you said, a penalty for overshooting (your keyspace gets eaten into), but if this is acceptable your technique sounds great.

Max (Hibernate Shards developer)

Popular posts from this blog

Updated ActiveRecord Code Generator

Today, I updated the ActiveRecord Code Generator a bit. I checked in changes to use primary and foreign key details from INFORMATION_SCHEMA. The original code used naming conventions to decide what various fields were used for - ID = Primary Key, Field_ID = Foreign Key to table Fields. If you want to use naming conventions, let me know and I can add a setting in App.Config to allow this (along with any "real" key constraints).

How does Rails scaffolding select HTML input tags?

Recently, a reader saw my fix for SQL Server booleans, and asked me a followup question: why does Rails display a yes/no selection instead of a checkbox? The short answer is look in {RUBY_HOME} /lib/ruby/gems/1.8 /gems/actionpack-1.10.2 /lib/action_view/helpers, but your path may vary depending on whether you are using gem, "edge rails", etc. Anyway, look in the file "active_record_helper.rb" for a method called "all_input_tags", and notice that it calls "default_input_block" if you don't supply an input_block. Now notice that "default_input_block" creates a label and calls "input(record, column.name)" which in turn calls "InstanceTag#to_tag" which finally looks at the datatype and maps boolean to a select tag. Perhaps a wiser Rails explorer can provide us with the rationale for this, but I guess we could add a MixIn for InstanceTag that redefines the to_tag() method, or just do a dirty and unmaintainable hack l...

Features of the Code Generator

I just updated my code generator to optionally generate validation attributes. This simple change includes App.config file entries for all check boxes, and a new checkbox for "Validation" - aka validation attibute generation. While I was making this change, I realized that I really need to pass a CodeGenerationContext object to the DbTable, DbField and ModelGenerator classes. The requester can populate the context, and pass it to the code generator. Anyway, enough about the code, let's talk about the templates. I made a simple template this weekend to generate a DataGridView column array, suitable for databinding. I'm sure my new template will need some tweaks to handle Foreign Keys better (it currently just displays them as TextBox). Let's look at a template. ##FILENAME:PR_${table.GetClassName()}_Insert.sql ## ## Generate a SQL stored procedure to insert a record into the ## specified table and return the newly created primary key ## ## FUTURE: The generat...