I’ve been putting more work into the revised version of
The repository now includes a simple README which is starting to outline how the interface looks…
import httpcore response = httpcore.request("GET", "https://www.example.com/") print(response) # <Response > print(response.status) # 200 print(response.headers) # [(b'Accept-Ranges', b'bytes'), (b'Age', b'557328'), (b'Cache-Control', b'max-age=604800'), ...] print(response.content) # b'<!doctype html>\n<html>\n<head>\n<title>Example Domain</title>\n\n<meta charset="utf-8"/>\n ...'
I’ve also opened an issue which sketches out my plans for the documentation and API.
What I’m particularly excited about here is the way that the design will allow you to drop into the network stack at various layers. I think the fundamentals of the design are going to make it much easier for developers to properly dig into and understand how the connection pooling works, and what’s happening at each layer.
I’ve started drawing up a potential kickstarter to launch alongside the HTTPX 1.0 release. This has also helped me start to clarify what the roadmap for HTTPX ought to look like.
I’ve made a tentative start on re-committing to a little more time on REST framework, by working through a couple of existing issues and discussions.
Really the project could do with having 1-day-a-week of my time dedicated to it.
I made the mistake this week of earmarking Friday for that work. In coming weeks I think it’d work better to have it be at the start of the week, since I think that’d help keep it prioritised.
The most significant piece of work this week has been re-assessing the automatic charset decoding policy in HTTPX.
When an HTTP response is returned it’ll generally have a
Content-Type header indicating if the response is a text document, such as
text/html, or a binary file, such as
image/jpg. For textual content we need to pick a character set to use in order to decode the raw binary content into a unicode string.
Ideally the server will indicate which encoding is being used within the
Content-Type header, with a value such as
text/html; charset=utf-8. However, the
charset parameter isn’t always present, and we need a policy to determine what to do in these cases.
Previously we’d adopted a keep-it-simple policy in
httpx, and attempted
utf-8 with fallbacks to other common encodings, but having been prompted to re-assess this, it seemed worth some time taking an evidence led approach onto determining what decoding policy to use.
In this repository I’ve taken a list of 1000 most accessed websites, and saved the downloaded content and response headers. Of these sites…
Based on these results we’ve decided to reintroduce automatic charset detection for cases that don’t include a
charset parameter. The results also demonstated sufficiently that the newer
charset_normalizer package performed as well or better than
chardet at detection, while being significantly faster.
Released HTTPX 0.19
charsetis included in the response