Synching Your Amazon S3 Asset Host using Capistrano

written by Scott on November 6th, 2007 @ 03:16 PM

Note: This article is out of date. The latest version of this article is on the new permanent page in the projects section.

So you’ve got multiple asset hosts running in your Rails application, and you’re using Amazon’s S3 to host your assets. Now you want to make sure that your assets are kept up to date. This plugin is a Capistrano recipe that keeps the asset hosts synchronized with the public directory in your subversion repository.

Usage

After you get everything setup and do your first deploy, just run cap deploy as normal and all changed files in RAILS_ROOT/public will be uploaded to all of your asset host buckets before the final deploy:symlink task.

The following tasks are also available:

  • cap s3_asset_host:get_s3_revision
  • cap s3_asset_host:find_changed
  • cap s3_asset_host:list_changed
  • cap s3_asset_host:find_all
  • cap s3_asset_host:upload_changed
  • cap s3_asset_host:upload_all
  • cap s3_asset_host:upload
  • cap s3_asset_host:reset_and_upload
  • cap s3_asset_host:setup
  • cap s3_asset_host:create_buckets
  • cap s3_asset_host:delete_all
  • cap s3_asset_host:connect

You can get documentation on these tasks by running cap -T

Requirements

This plug-in is a Capistrano extension. It requires Capistrano 2.0.0 or greater.

You will also require the aws-s3 gem

So far, this plug-in:
  • assumes that you are using the ‘checkout’ method of deployment.
  • only works with svn.

If you are using another version control system, I think all you’ll have to change is the two methods in lib/scm.rb. If you do get something other than svn working, please let me know.

If you want to use more than one asset host, then you have to either install the multiple asset hosts plugin or upgrade to Rails 2.0 (see setting up multiple asset hosts in Rails)

Setup

To set-up, you need to do the following

  • Install the plug-in
  • Install the AWS-S3 gem.
  • Set up your Rails application to use asset hosts.
  • Set up your asset hosts.
  • Configure Capistrano.

Installing the plug-in

From RAILS_ROOT, run:

script/plugin install svn://svn.spattendesign.com/svn/plugins/synch_s3_asset_host

Installing the AWS-S3 gem

You need to do this on both your local computer and the computer that is defined as the asset_host_syncher (see Capistrano Configuration, below).

$> sudo gem install aws-s3

Setting up your Rails app to use asset hosts

Single asset host

For a single asset host, simply add the following line to RAILS_ROOT/config/environments/production.rb:

config.action_controller.asset_host = "http://assets.example.com"

Multiple asset hosts

Follow the instructions in setting up multiple asset hosts in Rails

Setting up your asset hosts

Set up a CNAME entry for each asset host pointing to s3.amazonaws.com. How you do this depends on your domain host. Here’s what it looks like on easydns

You may need to wait up to 24 hours for the DNS entries for these new hosts to propagate.

Configuring Capistrano

Capistrano installation

This plugin requires Capistrano 2.0.0 or greater.

To upgrade to the latest version (currently 2.1.0):

$> gem install capistrano

Once the plug-in is installed, make sure that the recipes are seen by Capistrano

$> cap -T | grep s3_asset_host

should return a bunch of tasks. If you don’t see anything listed, then you need to update your Capfile by doing the following (this is from Jamis Buck):

In Capistrano 2.1.0 or above:
$> cd RAILS_ROOT
$> rm Capfile
$> capify .

If you do not want to delete your Capify file, or if you are using Capistrano 2.0.0, add the following line to your Capify file:
Dir['vendor/plugins/*/recipes/*.rb'].each { |plugin| load(plugin) }

Capistrano configuration

Create a new file in RAILS_ROOT/config called synch_s3_asset_host.rb. Add the following lines to it, and edit to suit:

# =============================================================================
# S3 ASSET HOST OPTIONS
# =============================================================================
set :asset_host_name, "assets%d.example.com" 
set :aws_access_key, "your Amazon AWS access key" # You can also set this in your environment as AMAZON_ACCESS_KEY_ID
set :amazon_secret_access_key, "your Amazon AWS secret" # You can also set this in your environment as AMAZON_SECRET_ACCESS_KEY
# set :dry_run, false# Set to true if you want to test the asset_host uploading without doing anything on Amazon S3
before "deploy:symlink", "s3_asset_host:upload_changed" 

You have to do one more thing: in RAILS_ROOT/config/deploy.rb. Specify one of your web hosts as an “asset_host_syncher”, like this:

role :web, webserver1, :asset_host_syncher => true

The first deploy

Commit all changes to your rails application and do the initial bucket setup:

$> cap s3_asset_host:setup
$> svn commit -m "Adding synch_s3_asset_host plugin" 
$> cap deploy
This will do the following:
  • Create your Amazon S3 AWS buckets
  • upload everything in RAILS_ROOT/public (in your svn repository) to each bucket
  • Set the revision in each bucket to the latest revision in your repository.

This could take a while if you have lots of images or other big files.

You’re done!

That should do it. Now, every time you run cap deploy, your asset hosts should be updated with any changes to files in RAILS_ROOT/public.

Let me know if you have any problems, suggestions or comments.

Comments

  • Manik on 07 Nov 06:15

    Scott, The plugin looks awesome. I plan to try it out on my project. But before I use this, I need to upgrade the project to Capistrano 2. The project is still using capistrano 1.4.1 Thanks for the plugin. Will post my experiences here.
  • Scott on 08 Nov 08:53

    Manik, That's great! Please do let me know how it goes. Hopefully the Capistrano upgrade goes smoothly. If you run into snags, feel free to send me an e-mail; I just did it a month ago, so it's still pretty fresh. Scott
  • Chris on 21 Nov 23:22

    Hi, I'm having some problems with the plugin. When it generates the FILES_TO_UPLOAD file, it lists some directories that don't exist in my public directory, such as: /var/www/myapp/releases/20071121005146/public/trunk/public /var/www/myapp/releases/20071121005146/public/trunk/vendor/rails/actionpack/test/fixtures/public /var/www/myapp/releases/20071121005146/public/trunk/myapp/trunk/public and so on.... So I get errors from the upload_to_s3 script that these paths don't exist. Any idea why I'm getting this or how to fix it? I'm using svn 1.4.4 and cap 2.1.0. Thanks, Chris
  • Chris on 22 Nov 00:48

    Well, I edited the code that generates the FILES_TO_UPLOAD file and had it ignore any file with the word trunk in it. That fixed some of the errors. However, now I'm still getting file not found errors for other files in the list. I notice that it's listing files that were in my repository years ago but were deleted a long time ago. This also causes file not found errors when it tries to upload to s3. My guess is that it has something to do with lib/scm.rb file in the plugin? -Chris
  • Scott on 22 Nov 15:57

    Hi Chris, Thanks for the bug reports. I made a change to the upload_to_multiple_buckets script that should fix both problems. Can you try it out and let me know how it goes? You can update the plugin by doing @script/plugin install --force svn://svn.spattendesign.com/svn/plugins/synch_s3_asset_host@
  • Chris on 24 Nov 02:19

    Hi Scott, We're getting there ;-). All the file not found errors are gone now, except now I've noticed a few of my files aren't being uploaded to s3. Upon more investigation, I noticed that some of the files in the FILES_TO_UPLOAD file have a trailing space at the end of the name. With your new fix that checks for file existence, the trailing space in the name causes it to return false. i.e >> File.exist?('/Users/chris/test.log') => true >> File.exist?('/Users/chris/test.log ') => false Anyways, everything should be good if you trim the trailing spaces in the file name before checking if it exists. Best, Chris
  • Scott on 24 Nov 20:45

    Chris, thanks again for your help here. I've made a change that I think should squash that bug, but can't really test it. Can you try it again and let me know? Scott
  • Chris on 25 Nov 16:57

    Hi Scott, Perfect, works like a charm now!! One other suggestion I had: have you considered gziping text files like the css, js, etc. before uploading it to s3? You can serve them from S3 compressed (cheaper and faster), and then the client's browser will decompress it. Here's an example of someone doing it on s3: http://devblog.famundo.com/articles/2007/03/02/serving-compressed-content-from-amazons-s3 I tried it out and it seems to work well. Only downside I guess is that it doesn't work in really old browsers. -Chris
  • Scott on 26 Nov 13:14

    Chris, Great! Glad to hear it works for you. I took a look at the compression idea. It looks pretty simple to do, but I'm going to have to think about it a bit. I was thinking of migrating to using s3synch.rb to do the uploading, and I'd need to figure out an elegant way to do the compression with s3synch.rb. Thanks, Scott
  • Chris on 26 Nov 15:14

    Hi Scott, Actually I spoke too soon. I was going through my site with s3 and noticed that there were some images that it didn't upload. I'm not sure why unfortunately. The files don't appear in the FILES_TO_UPLOAD file, so it must have something to do with the information its getting from svn. What I can tell you is that these images once resided in 'public/images/folder'. Over the course of the project, I issued an svn move command so they now reside in public/images/folder/another_folder. The directory public/images/folder/another_folder/ does appear in the FILES_TO_UPLOAD but none of the files inside the folder appear in the list...
  • Chris on 26 Nov 16:55

    Hi Scott, Still trying to figure out why those files aren't being included. My best guess is that it has something to do with issuing svn move commands on directories. It looks like you have code to support moves of files but not directories... I had it print out the name of the files on line 34 on scm.rb and it never printed out those missing files. What does appear though is this: A /trunk/myapp/trunk/public/images/folder/another_folder (from /trunk/myapp/trunk/public/images/another_folder:591) Hopefully this helps a bit. -Chris
  • Scott on 26 Nov 22:07

    Hi Chris, Hmm. I was never happy with the way I was finding the files to upload, and it looks like you're paying the price. Sorry about that. I'm going to spend some time over the next couple of days updating the code to use s3synch.rb. This should take care of your problems, and make it work with content management systems other than svn as well. I'll post something when I get it worked out. Thanks for all of your help. Scott
  • Chris on 26 Nov 22:35

    Hi Scott, Ya, you're right, it seems like svn isn't the best route for finding files to upload afterall. I'm not sure there's any clean solution to the svn move directory problem. There were some other problems I ran into as well if you're curious. Since I use the asset packager plugin for rails, the plugin generates all the compressed css/js in another capistrano recipe before deploying, so the assets it creates are never committed to svn (thus never caught by your plugin). I ended up modifying the packager plugin to upload its assets to S3 as well. If you haven't checked it out: http://synthesis.sbecker.net/pages/asset_packager the packager plugin is pretty handy. Anyways, the s3sync sounds like a better route. As I understand it, it's like rsync for s3, right? Look forward to the update! Best, -Chris
  • Chris on 27 Nov 02:57

    Hey Scott, I was playing around with s3sync. Pretty nifty and easy. I whipped up a quick capistrano task using it. Thought I'd paste it below, might save you some time if you haven't delved into it yet. I'm having it ignore files/directories that start with ., or contain .svn, and .DS_Store (ya i'm a mac guy ;-). Anyways hope it helps. Seems like line breaks don't work in your blog, so hopefully the code below is readable after copy&pasting. Unfortunately s3sync doesn't do gziping of assets, so I'm gonna look into how to add it to s3sync. NUM_ASSET_HOSTS = 4 ASSET_HOSTS = "assets%d.mysite.com" namespace :s3 do desc "Sync S3" task :sync, :roles => :web, :only => {:asset_host_syncher => true} do (0...NUM_ASSET_HOSTS).each do | n | run "cd #{release_path}/vendor/gems/s3sync/ && ./s3sync.rb -sprv --exclude='(\\.svn)|(\\.DS_Store)|(^\\.)' --cache-control='max-age=604800' #{release_path}/public/ #{ASSET_HOSTS % n}:" end end end
  • Scott on 28 Nov 13:40

    Chris, thanks for the code. That looks like a nice way to do it. I may just steal it verbatim if that's okay with you. Let me know if you get it zipping assets, too. -- Scott
  • Chris on 28 Nov 20:13

    Hey Scott, Sure, feel free to use it! I'll let you know once I look into the s3sync.rb code some more about the gziping. I don't see anyway to really do it without hacking s3sync.rb up unfortunately. But I will post my findings once I look into it more. Best, Chris
  • Scott on 03 Dec 16:46

    Just in case you didn't notice at the top, I've updated this article and put it in a new, permanent location at http://spattendesign.com/projects/synching-your-amazon-s3-asset-host-using-capistrano. The latest version uses s3sync.rb and should work much better. -- Scott"
  • Matt Harvey on 23 Dec 18:50

    This did everything perfect until I tried to load my site. The ACLs on all the files did not allow them to be world-readable. I don't know why. I just went in with irb and AWS::S3 to fix the ACLs after the fact, but I get the feeling that I must have done something wrong.
  • Matt Harvey on 23 Dec 20:37

    Fixed my problem about the ACL not set to public access by setting the --public-read flag to the options to s3sync.rb on line 183 of recipies/synch_s3_asset_host.rb. The plugin still saved me lots of time. Thanks.
  • Scott on 24 Dec 09:54

    Oops. Matt, thanks for pointing that out. I made the same change to the plugin in the svn repository. Stuff like this makes me wish I had found a better way to unit test this plugin. -- Scott

Comments are closed