Return to blog index Harry Otten 10 November 2021

QR Code Series: The Basics (Part 1)

For one of our offline problem-solving functionalities, we've been looking into QR codes. During this process we started wondering, how much data can you actually fit into a QR code (practically)? Taking this practicality into account, we set out with the following question: How can we get the maximum amount of data into a single QR code that is scannable with any mobile phones native/built-in scanner and can parse a URL? In this first instalment, we will explore the various QR code variations and their quirks and features. Read the second instalment here!

QR Code Versions
A 21x21, 25x25, and 29x29 QR code. Source: Wikipedia

The standards

There are a few different types of QR code versions available. These are denoted by an integer and a letter, 25-H for example. The integer denotes the size of the QR code with the formula $4 \times V + 17$, where $V$ is the version of the QR code. A version 25 QR code therefore has dimensions of 117x117. The letter in the version denotes the amount of error correction that is built into the QR code. An H error correction gives you 30% data byte restore capacity. For this project we've selected the 40-L QR code, giving us a QR code of 177x177 and 7% data bytes restoration. According to Wikipedia the available character storage capacity in a 40-L QR code is separated into four categories:

Maximum QR Storage capacity (version 40-L QR code)

Input mode Max. characters Bits/char. Possible characters with default encoding
Numeric only 7,089 $3\frac{1}{3}$ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Alphanumeric 4,296 $5\frac{1}{2}$ 0–9, A–Z (upper-case only), space, $, %, *, +, -, ., /, :
Binary/byte 2,953 $8$ ISO 8859-1
Kanji/kana 1,817 $13$ Shift JIS X 0208

From: https://en.wikipedia.org/wiki/QR_code on 2021-11-10

When looking at the table above some interesting values present themselves, and it becomes evident that Wikipedia was not entirely accurate. What are 1/3 or half bits and what can we do with them? Some digging and calculating resulted in the following table for 40-L QR codes (kanji/kana have been omitted):

Input Mode Max char. Bits/char. Base Total Bits Actual total bits Group Size Total bits Max value Max alphabet Values not used
Numeric only 7.089 $3\frac{1}{3}$ 10 23.630 23.549 3 10 1024 1000 24
Alphanumeric 4.296 $5\frac{1}{2}$ 45 23.628 23.593 2 11 2048 2025 23
Binary/byte 2.953 $8$ 256 23.624 23.624 1 8 256 256 0

The group size provides the explanation for the fractal-bit issue. In order to obtain unfractured bits, 3 characters are needed in the case of numeric input, and 2 are needed for alphanumeric input. It becomes evident that quite some storage is lost on numeric and alphanumeric input. This is due to the fact that the bits per char. are not exactly $3\frac{1}{3}$, but actually $3.321928$. To get this actual bits per character value, we need to calculate the logarithm of base 2 (we are encoding bits with another base). This results in $\log_{2}10 = 3.321928$ for numeric and $\log_{2}45 = 5.491853$ for alphanumeric. The binary/byte input mode wins out automatically here due to $\log_{2}256 = 8$ resulting in 0 lost bits.

Quick Maths

Getting something in there

Okay, so we've done some pretty boring maths and established that the binary/byte input mode is the way to go to get the maximum out of our QR code with 23.624 bits. But wait, at the start of this article we mentioned something about practicality. As it turns out, iOS doesn't recognize QR codes with more than 4000 characters in the URL. This means that even though the QR standard would allow us to use 7.089 characters, an iPhone couldn't even read it. Taking the iPhone's limitation into account we get the following available bits:

  • $4000 * \log_{2}10 \approx 13287$ bits of numeric input
  • $4000 * \log_{2}45 \approx 21967$ bits of alphanumeric input
  • $2953 * \log_{2}256 = 23624$ bits of binary/byte input

With the limited characters, binary wins again! It's a bad day to be an alphanumeric character :^( So now all we have to do is encode a lot of data into an url and then encode it into a binary QR code. In the next instalment we discover if that is as easy as it sounds. Right?