Chunked Upload With Nginx And NodeJS

installing-nodejs-with-nginx-proxyUploads is an integral part of our audio/video transcription service. It is usually the first step for our customers. And since we work on audio/video files, the file sizes can be in GB’s. Sending GB’s of data in one single HTTP POST is not very reliable. Unless you have a very good internet connection the chances are that such POSTs  will time out. Therefore we decided to implement the chunked uploads on our website. In this post we’ll go through the steps we followed.

We use Nginx 1.5 as a reverse proxy for NodeJS which is our application server. Nginx supports chunked uploads natively since version 1.3.9 via the client_body_in_file_only directive (earlier versions required the upload module). The documentation is not very helpful, but as implied in the name, the body of the POST is parsed and written to a file. For chunked uploads each chunk is written to a separate file and the application has to then reassemble it. We used the following Nginx configuration.

    location = /files/upload/xhr {
        client_body_temp_path /tmp;
        client_body_in_file_only on;
        proxy_pass_request_headers on;
        proxy_set_header X-FILE $request_body_file;
        proxy_redirect off;
        proxy_set_body off;
        proxy_http_version 1.1;
        proxy_pass nodejs;
    }

The key parameter here is the X-FILE header which is where the POST data was written to. The application has to then process these individual chunks.

    
    /* Chunked upload sessions will have the content-range header */
    if(req.headers['content-range']) {
        /* the format of content range header is 'Content-Range: start-end/total' */
        var match = req.headers['content-range'].match(/(\d+)-(\d+)\/(\d+)/);
        if(!match || !match[1] || !match[2] || !match[3]) {
            /* malformed content-range header */
            res.send('Bad Request', 400);
            return;
        }

        var start = parseInt(match[1]);
        var end = parseInt(match[2]);
        var total = parseInt(match[3]);

        /* 
         * The filename and the file size is used for the hash since filenames are not always 
         * unique for our customers 
         */

        var hash = crypto.createHash('sha1').update(filename + total).digest('hex');
        var target_file = "app/uploads/" + hash + path.extname(filename);

        /* The individual chunks are concatenated using a stream */  
        var stream = streams[hash];
        if(!stream) {
            stream = fs.createWriteStream(target_file, {flags: 'a+'});
            streams[hash] = stream;
        }

        var size = 0;
        if(fs.existsSync(target_file)) {
            size = fs.statSync(target_file).size;
        }

        /* 
         * basic sanity checks for content range
         */
        if((end + 1) == size) {
            /* duplicate chunk */
            res.send('Created', 201);
            return;
        }

        if(start != size) {
            /* missing chunk */
            res.send('Bad Request', 400);
            return;
        }

        /* if everything looks good then read this chunk and append it to the target */
        fs.readFile(req.headers['x-file'], function(error, data) {
            if(error) {
                res.send('Internal Server Error', 500);
                return;
            }

            stream.write(data);
            fs.unlink(req.headers['x-file']);

            if(start + data.length >= total) {
                /* all chunks have been received */
                stream.on('finish', function() {
                    process_upload(target_file);
                });

                stream.end();
            } else {
                /* this chunk has been processed successfully */
                res.send("Created", 201);
            }
        });
    } else {
        /* this is a normal upload session */
        process_upload(req.headers['x-file']);
    }

The target file hash plays an important role for resumable uploads. If the same file is being re-uploaded, we detect that via the hash and tell our client to start from the last received byte. The same logic is followed for error handling; we tell the client to upload the file once again from the last byte received.

On the client side we use the excellent jQuery File Upload plugin by BlueImp. It supports XHR chunked file uploads. The resumable uploads is implemented via the add callback where we send a GET request with the filename and the file size. The application responds with the last byte received which is then set as the uploadedBytes. The upload then resumes from the next byte.

    $(".file-input").fileupload
        singleFileUploads: true
        multipart: false
        maxChunkSize: 1024 * 1024 * 10
        retries: 0
        maxRetries: 300

        add: (e, data) ->
            $.ajax
                type: "GET"
                url: "/files/upload/bytes-received"
                data:
                    fn: data.files[0].name
                    sz: data.files[0].size
                dataType: "json"
                success: (res) ->
                    data.uploadedBytes = parseInt(res.s.sz) if res.e is 0
                    data.uploadedBytes = 0 if data.files[0].size is data.uploadedBytes
                    data.submit()

We use a chunk size of 10 MB. The singleFileUploads is necessary so that each file is sent via a separate POST since our app implementation supports only one file per POST. Multipart false is set to true as suggested in the plugin documentation.

We have had this implementation live since the past few months and we have not seen any upload failures since then. Sometimes uploads have gone on for 12+ hours and succeeded in the end! Imagine the frustration it would cause if the uploads fail after that time.

4 Comments

Leave a Reply