woensdag 25 mei 2011

Exporting CSV with special characters to Excel

For a Django web application I want to offer the possibility of exporting tables to Excel. This seems pretty straightforward, and thankfully the folks at Django have some excellent examples.

I decide to make use of the template system with the render_to_response method. The Django handler:
def handleExport(request):
    # retrieve a query_set
    ...
    response = render_to_response("/export/export.csv", 
                                  { 'data': query_set })
   
    filename = "export_%s.csv" % (datetime.now().strftime("%Y%m%d%H%M%S"))
    response['Content-Disposition'] = 'attachment; filename='+filename
    response['Content-Type'] = 'text/csv; charset=utf-8'
    
    return response
The template file export.csv looks like this:
{% if data %}header1;header2;etc
{% for row in data  %}
{{ row.column1 }};{{ row.column2 }}{{ etc }}{% endfor %}
{% endif %}
All I need to do now is add a link pointing to the handler, and the browser will pop up a dialog, asking me whether to open the file in Excel or to save it.

But when I open the file in Excel, I notice that special characters are displayed improperly. For instance the 'ë' is displayed as 'ë'. Reason: Encoding. Because the default encoding in Django is UTF-8, the CSV file is in UTF-8. But when opening it, Excel assumes it to be encoded in ANSI. It is possible to import it into Excel by first saving the file and then opening it from Excel - you specify the encoding in the process. You can read all about it at stackoverflow. But we would really like it to open from the browser.

The strange thing is, the solution is mentioned in the thread, but it's not getting the votes! To have Excel properly handle the UTF-8 encoded CSV file, you have to add a so called 'byte order mark' (BOM) to the file. You can read about BOMs at wikipedia, but I wouldn't bother. It comes down to writing a few bytes to the beginning of the file: EF FF BF. In Django it looks like this:
def handlePersoonListExport(request):
    # Retrieve a query_set
    ...
   
    template = loader.get_template("export.csv")
    context = Context({
        'data': query_set,
    })
    
    response = HttpResponse() 
    filename = "export_%s.csv" % (datetime.now().strftime("%Y%m%d%H%M%S"))
    response['Content-Disposition'] = 'attachment; filename='+filename
    response['Content-Type'] = 'text/csv; charset=utf-8'
    # Add UTF-8 'BOM' signature, otherwise Excel will assume the CSV file
    # encoding is ANSI and special characters will be mangled
    response.write("\xEF\xBB\xBF")
    response.write(template.render(context))
    
    return response
We now create a HttpResponse and write the necessary bytes before appending the rendered template. It works like a charm!