Download a large amount of data in CSV from Rails

I’ve just had to re-write an export script because the application now needed to be able to download 14000 records in CSV format.
Originally I was reading all the records (with multiple joins) and then creating a csv string and then writing it to the response using send_data.

The original code

This is a simplified version of my code

accounts = Account.find(:all)
csv_string = FasterCSV.generate do |csv|
  accounts.each do |account|
    csv << [account.id, account.name]
  end
end
send_data csv_string, :filename => 'accounts.csv',
                      :type => 'text/csv',
                      :disposition => 'attachment'

My first thought was to stream the data using

send_file_headers!  :filename => 'accounts.csv',
                    :type => 'text/csv',
                    :disposition => 'attachment'
render :text => Proc.new {|response, output|
  ...
}

The problem with this is that send_file_headers! requires that the :length option be set. I also was running into the problem of loading 14000 records into memory before iterating over them.

For the memory problem, I considered using the new Model.find_in_batches but I needed to join tables and use some complicated conditions.

The solution

I finally settled on chunked reading of the data from the database and a more explicit use of render :text => Proc…

require 'fastercsv'
def download
  filename = 'accounts.csv'
  headers.merge!(
    'Content-Type' => 'text/csv',
    'Content-Disposition' => "attachment; filename=\"#{filename}\"",
    'Content-Transfer-Encoding' => 'binary'
  )
  @performed_render = false

  render :status => 200, :text => Proc.new { |response, output|
    headings = ["ID", "Name"]
    output.write FasterCSV.generate_line(headings)

    last_account_id = 0
    while last_account_id do
      accounts = Account.find(:all,
                              :conditions => ["accounts.id > ?", last_account_id],
                              :order => 'accounts.id',
                              :limit => 1000)

      last_account_id = accounts.size > 0 ? accounts[-1].id : nil

      accounts.each { |account|
        data = [account.id, account.name]
        output.write FasterCSV.generate_line(data)
      }
    end
  }
end

I hope this useful for you, if you have any questions, please ask in the comments.

Comments (9)

PeterJune 3rd, 2009 at 19:14

Interesting; what does the @performed_render do? I think I get the idea, but can you explain a little more explicitly (or link to) how the Proc approach allows you to stream the results?

You could probably use some of the pagination plugins to handle the batch loading nicely. E.g., with will_paginate:

accounts = Account.paginated_each(:conditions => {…}, :per_page => 1000) do |account|

Scott TadmanJune 3rd, 2009 at 19:54

Loading all the models and then extracting two fields seems like an enormous waste of resources. It may be more efficient to simply load the columns using raw SQL and the select_rowss method, which can be significantly faster on any table that has BLOB columns.

Account.connection.select_rows(”SELECT id, name FROM #{Account.table_name} ORDER id”).each do |row|
output.write(FasterCSV.generate_line(row))
end

Eric LitwinJune 3rd, 2009 at 23:57

I have been having this same problem with CSV exports - I have a case where there are 30000+ records of 60+ fields per record. Needless to say, it takes a long time and usually ends up timing out in production (usually after Nginx has spawned it multiple times eating up all my Mongrel instances).

I’m going to try this out - it seems much simpler than using the queing/scheduling solution I was investigating.

Mason JonesJune 4th, 2009 at 01:15

I had to do something similar to generate an Excel file using a CSV input, and what I ended up doing was to generate the file to disk under an Apache doc root, and then do a redirect to the URL for the file so I could let Apache handle getting the actual file to the browser instead of having Rails wait (some of the users had very slow connections).

Eric LitwinJune 4th, 2009 at 02:23

This improves the memory usage quite a bit, but I should have realized from reading the code that it wouldn’t really do anything to improve the overall processing time.

When I combined this with the connection.select_rows suggestion (thanks Scott), I saw both a big memory and processing time reduction.

BlytheJune 4th, 2009 at 19:41

Yah I ran into some trouble with sending and building large files (around 250k records) and ended up using nginx version of X-Send-File. You basically create a file and point (nginx or apache) to it. http://spongetech.wordpress.com/2007/11/13/the-complete-nginx-solution-to-sending-flowers-and-files-with-rails/
Also for exporting files, try ArDumper. http://github.com/blythedunham/ar_dumper
There is a paginated version that writes to a file to avoid running out of memory which I then sent off to nginx. Much slower than direct DB export but super customizable.
Book.dump :csv, :except => [:topic_id], :methods [:foo, :bar], :proc => {:superproc => Proc.new {|dump_options| “HI!” }

Kamil KukuraJune 7th, 2009 at 20:56

For chunked processing you may use http://afreshcup.com/2009/02/23/rails-23-batch-finding/

DavidJuly 10th, 2009 at 02:06

Anyone else having problems with IE (6 and 7 at least) borking on such downloads?

Andrew TimberlakeJuly 10th, 2009 at 10:21

Hi David
I haven’t tested this on anything but Firefox because the app I wrote it for is used in a closed environment.
One thing I did notice is that if you use a download manager, the server gets asked for the data twice which can be a problem.
Not sure if IE might be doing something similar