Download a large amount of data in CSV from Rails
I’ve just had to re-write an export script because the application now needed to be able to download 14000 records in CSV format.
Originally I was reading all the records (with multiple joins) and then creating a csv string and then writing it to the response using send_data.
The original code
This is a simplified version of my code
csv_string = FasterCSV.generate do |csv|
accounts.each do |account|
csv << [account.id, account.name]
end
end
send_data csv_string, :filename => 'accounts.csv',
:type => 'text/csv',
:disposition => 'attachment'
My first thought was to stream the data using
:type => 'text/csv',
:disposition => 'attachment'
render :text => Proc.new {|response, output|
...
}
The problem with this is that send_file_headers! requires that the :length option be set. I also was running into the problem of loading 14000 records into memory before iterating over them.
For the memory problem, I considered using the new Model.find_in_batches but I needed to join tables and use some complicated conditions.
The solution
I finally settled on chunked reading of the data from the database and a more explicit use of render :text => Proc…
def download
filename = 'accounts.csv'
headers.merge!(
'Content-Type' => 'text/csv',
'Content-Disposition' => "attachment; filename=\"#{filename}\"",
'Content-Transfer-Encoding' => 'binary'
)
@performed_render = false
render :status => 200, :text => Proc.new { |response, output|
headings = ["ID", "Name"]
output.write FasterCSV.generate_line(headings)
last_account_id = 0
while last_account_id do
accounts = Account.find(:all,
:conditions => ["accounts.id > ?", last_account_id],
:order => 'accounts.id',
:limit => 1000)
last_account_id = accounts.size > 0 ? accounts[-1].id : nil
accounts.each { |account|
data = [account.id, account.name]
output.write FasterCSV.generate_line(data)
}
end
}
end
I hope this useful for you, if you have any questions, please ask in the comments.


Interesting; what does the @performed_render do? I think I get the idea, but can you explain a little more explicitly (or link to) how the Proc approach allows you to stream the results?
You could probably use some of the pagination plugins to handle the batch loading nicely. E.g., with will_paginate:
accounts = Account.paginated_each(:conditions => {…}, :per_page => 1000) do |account|
Loading all the models and then extracting two fields seems like an enormous waste of resources. It may be more efficient to simply load the columns using raw SQL and the select_rowss method, which can be significantly faster on any table that has BLOB columns.
Account.connection.select_rows(”SELECT id, name FROM #{Account.table_name} ORDER id”).each do |row|
output.write(FasterCSV.generate_line(row))
end
I have been having this same problem with CSV exports - I have a case where there are 30000+ records of 60+ fields per record. Needless to say, it takes a long time and usually ends up timing out in production (usually after Nginx has spawned it multiple times eating up all my Mongrel instances).
I’m going to try this out - it seems much simpler than using the queing/scheduling solution I was investigating.
I had to do something similar to generate an Excel file using a CSV input, and what I ended up doing was to generate the file to disk under an Apache doc root, and then do a redirect to the URL for the file so I could let Apache handle getting the actual file to the browser instead of having Rails wait (some of the users had very slow connections).
This improves the memory usage quite a bit, but I should have realized from reading the code that it wouldn’t really do anything to improve the overall processing time.
When I combined this with the connection.select_rows suggestion (thanks Scott), I saw both a big memory and processing time reduction.
Yah I ran into some trouble with sending and building large files (around 250k records) and ended up using nginx version of X-Send-File. You basically create a file and point (nginx or apache) to it. http://spongetech.wordpress.com/2007/11/13/the-complete-nginx-solution-to-sending-flowers-and-files-with-rails/
Also for exporting files, try ArDumper. http://github.com/blythedunham/ar_dumper
There is a paginated version that writes to a file to avoid running out of memory which I then sent off to nginx. Much slower than direct DB export but super customizable.
Book.dump :csv, :except => [:topic_id], :methods [:foo, :bar], :proc => {:superproc => Proc.new {|dump_options| “HI!” }
For chunked processing you may use http://afreshcup.com/2009/02/23/rails-23-batch-finding/
Anyone else having problems with IE (6 and 7 at least) borking on such downloads?
Hi David
I haven’t tested this on anything but Firefox because the app I wrote it for is used in a closed environment.
One thing I did notice is that if you use a download manager, the server gets asked for the data twice which can be a problem.
Not sure if IE might be doing something similar