Recipes

This page includes a collection of recipes in the urlib3 cookbook.

Decode HTTP Response Data in Concatenated Gzip Format

By default, urllib3 checks Content-Encoding header in HTTP response and decodes the data in gzip or deflate transparently. If Content-Encoding is not either of them, however, you will have to decode data in your application.

This recipe shows how to decode data in the concatenated gzip format where multiple gzipped data chunks are concatenated in HTTP response.

import zlib
import urllib3

CHUNK_SIZE = 1024

def decode_gzip_raw_content(raw_data_fd):
    obj = zlib.decompressobj(16 + zlib.MAX_WBITS)
    output = []
    d = raw_data_fd.read(CHUNK_SIZE)
    while d:
        output.append(obj.decompress(d))
        while obj.unused_data != b'':
            unused_data = obj.unused_data
            obj = zlib.decompressobj(16 + zlib.MAX_WBITS)
            output.append(obj.decompress(unused_data))
        d = raw_data_fd.read(CHUNK_SIZE)
    return b''.join(output)


def test_urllib3_concatenated_gzip_in_http_response():
    # example for urllib3
    http = urllib3.PoolManager()
    r = http.request('GET', 'http://example.com/abc.txt',
                     decode_content=False, preload_content=False)
    content = decode_gzip_raw_content(r).decode('utf-8')

obj.unused_data includes the left over data in the previous obj.decompress method call. A new zlib.decompressobj is used to start decoding the next gzipped data chunk until no further data is given.