30 Oct 2011
Nowadays it's easy to reach millions users for a startup website due to sns success, that means the tables on production server are huge, if you want to alter table structure, like add a new column, it may takes tens of minutes or several hours to execute.
If you want to add a new column and drop an existing column to one existing table, by default rails migration will run two alter sql statements, one for add and the other for drop, so it takes double time, but if we merge/optimize the two alter sql statements into one, it saves the migration time dramatically.
The followings two are the typical cases to optimize the db migrations.
Case 1: add a new column but drop it later
class AddStatusToUsers < ActiveRecord::Migration def self.up add_column :users, :status, :string end end
class RemoveStatusToUsers < ActiveRecord::Migration def self.up remove_column :users, :status end end
In development we add the column status to table users and remove it later because of requirement changed, if we deploy by capistrano, these two migrations will take tens of minutes (if you have million users), at the mean time, the users table will be locked. But actually there is no necessary to execute these two migrations when deploying to production server.
You may think to simply delete these two migrations, please don't do that, it may break other developers' local database structure. What we should do is to tell rails migration to skip these two migrations when running "rake db:migration" on production server. Rails uses schema_migrations to indicate which migrations are executed or not, so we can manually update the schema_migrations indexes, in this case, we should run
INSERT INTO `schema_migrations` VALUES ('20110930100808') INSERT INTO `schema_migrations` VALUES ('20111010100909')
on production database. After that, when we deploy the codes to production server, these two migrations will be skipped.
Case 2: multiple alter table sql statements
class RemoveLocationFromUsers < ActiveRecord::Migration def self.up remove_column :users, :location end end
class AddGenderToUsers < ActiveRecord::Migration def self.up add_column :users, :gender, :string end end
ok, before next deployment, we remove a column location and add a column gender to users table, they will execute two table alter sql statements
ALTER TABLE `users` DROP COLUMN location ALTER TABLE `users` ADD COLUMN gender varchar(255) DEFAULT NULL
Let's assume add a column and remove a column to users table take 1 hour for each, so it will take totally 2 * 1 = 2 hours to execute these two migrations, but if we can merge these two table alter sql statements into one, it only takes about 1 hour to execute. So we should create two additional migrations to save production migration time.
class AddLocationAndRemoveGenderToUsers < ActiveRecord::Migration def self.up add_column :users, :location, :string remove_column :users, :gender end end
class RemoveLocationAndAddGenderToUsers < ActiveRecord::Migration def self.up connection.execute("ALTER TABLE `users` DROP COLUMN location, ADD COLUMN gender varchar(255) DEFAULT NULL") end end
When running "rake db:migrate" during production deployment, we should skip first 3 migrations and execute only the last migration, so manually update the schema_migrations index as follows
insert into `schema_migrations` VALUES ('20111002101010') insert into `schema_migrations` VALUES ('20111003101010') insert into `schema_migrations` VALUES ('20111011040404')
if you are using rails 3.1, you can use change_table with :bulk => true option to execute bulk alter
class RemoveLocationAndAddGenderToUsers < ActiveRecord::Migration def self.up change_table :users, :bulk => true do |t| t.remove :location t.string :gender end end end
And finally, don't forget to add comments to these optimized migrations so that other developers will not be confused with them.