QR Code Series: The Basics (Part 1)
For one of our offline problem-solving functionalities, we've been looking into QR codes. During this process we started wondering, how much data can you actually fit into a QR code (practically)? Taking this practicality into account, we set out with the following question: How can we get the maximum amount of data into a single QR code that is scannable with any mobile phones native/built-in scanner and can parse a URL? In this first instalment, we will explore the various QR code variations and their quirks and features. Read the second instalment here!
A 21x21, 25x25, and 29x29 QR code. Source: Wikipedia |
The standards
There are a few different types of QR code versions available. These are denoted by an integer and a letter, 25-H for example. The integer denotes the size of the QR code with the formula $4 \times V + 17$, where $V$ is the version of the QR code. A version 25 QR code therefore has dimensions of 117x117. The letter in the version denotes the amount of error correction that is built into the QR code. An H error correction gives you 30% data byte restore capacity. For this project we've selected the 40-L QR code, giving us a QR code of 177x177 and 7% data bytes restoration. According to Wikipedia the available character storage capacity in a 40-L QR code is separated into four categories:
Maximum QR Storage capacity (version 40-L QR code)
Input mode | Max. characters | Bits/char. | Possible characters with default encoding |
---|---|---|---|
Numeric only | 7,089 | $3\frac{1}{3}$ | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 |
Alphanumeric | 4,296 | $5\frac{1}{2}$ | 0–9, A–Z (upper-case only), space, $, %, *, +, -, ., /, : |
Binary/byte | 2,953 | $8$ | ISO 8859-1 |
Kanji/kana | 1,817 | $13$ | Shift JIS X 0208 |
From: https://en.wikipedia.org/wiki/QR_code on 2021-11-10
When looking at the table above some interesting values present themselves, and it becomes evident that Wikipedia was not entirely accurate. What are 1/3 or half bits and what can we do with them? Some digging and calculating resulted in the following table for 40-L QR codes (kanji/kana have been omitted):
Input Mode | Max char. | Bits/char. | Base | Total Bits | Actual total bits | Group Size | Total bits | Max value | Max alphabet | Values not used |
---|---|---|---|---|---|---|---|---|---|---|
Numeric only | 7.089 | $3\frac{1}{3}$ | 10 | 23.630 | 23.549 | 3 | 10 | 1024 | 1000 | 24 |
Alphanumeric | 4.296 | $5\frac{1}{2}$ | 45 | 23.628 | 23.593 | 2 | 11 | 2048 | 2025 | 23 |
Binary/byte | 2.953 | $8$ | 256 | 23.624 | 23.624 | 1 | 8 | 256 | 256 | 0 |
The group size provides the explanation for the fractal-bit issue. In order to obtain unfractured bits, 3 characters are needed in the case of numeric input, and 2 are needed for alphanumeric input. It becomes evident that quite some storage is lost on numeric and alphanumeric input. This is due to the fact that the bits per char. are not exactly $3\frac{1}{3}$, but actually $3.321928$. To get this actual bits per character value, we need to calculate the logarithm of base 2 (we are encoding bits with another base). This results in $\log_{2}10 = 3.321928$ for numeric and $\log_{2}45 = 5.491853$ for alphanumeric. The binary/byte input mode wins out automatically here due to $\log_{2}256 = 8$ resulting in 0 lost bits.
Getting something in there
Okay, so we've done some pretty boring maths and established that the binary/byte input mode is the way to go to get the maximum out of our QR code with 23.624 bits. But wait, at the start of this article we mentioned something about practicality. As it turns out, iOS doesn't recognize QR codes with more than 4000 characters in the URL. This means that even though the QR standard would allow us to use 7.089 characters, an iPhone couldn't even read it. Taking the iPhone's limitation into account we get the following available bits:
- $4000 * \log_{2}10 \approx 13287$ bits of numeric input
- $4000 * \log_{2}45 \approx 21967$ bits of alphanumeric input
- $2953 * \log_{2}256 = 23624$ bits of binary/byte input
With the limited characters, binary wins again! It's a bad day to be an alphanumeric character :^( So now all we have to do is encode a lot of data into an url and then encode it into a binary QR code. In the next instalment we discover if that is as easy as it sounds.