QR Code Series: Encoding the data (Part 2)
In the previous blog of this series, we covered the basics of QR codes and their limits. In this second instalment, we will dive into the nitty gritty of fitting the most data possible into a QR code. We will look at the relation between the QR input method and URL encoding, the limits on the amount of data, and using some tricks to squeeze every bit out of a QR code.
Encoding an URL
In the first instalment, we established that binary data seems the most efficient way of inputting data for QR codes. The logical next step would be to encode it into an URL. Putting the data inside an URL is no problem, but keeping the URL valid presents a challenge. We can safely use A-Z
, a-z
, 0-9
, and some special symbols. All-in-all, 71 characters are available inside a valid URL.
Calculating the fit
In summary, binary input gives us 256 unique values to work with and an URL can consist of at least 71 (10 + 26 + 26 + 3 unreserved and 6 reserved characters) unique characters. We then select base64 encoding to encode the data within the URL due to its fit within the 71 available characters within the URL. The constraint of 71 characters results in the following loss of bytes:
- $\log_{256}256 \times 2953 = 2953$
- $\log_{256}64 \times 2953 \approx 2214 $
- $\log_{256}71 \times 2953 \approx 2270$
This results in a whopping 23 % loss due to the 71 character constraint and 56 unused bytes due to base64. Let's compare this to alphanumeric and see if we can do any better. The complete charset for alphanumeric QR input is:
Code | Decoded | Code | Decoded | Code | Decoded | Code | Decoded |
---|---|---|---|---|---|---|---|
00 | 0 | 12 | C | 24 | O | 36 | Space |
01 | 1 | 13 | D | 25 | P | 37 | $ |
02 | 2 | 14 | E | 26 | Q | 38 | % |
03 | 3 | 15 | F | 27 | R | 39 | * |
04 | 4 | 16 | G | 28 | S | 40 | + |
05 | 5 | 17 | H | 29 | T | 41 | - |
06 | 6 | 18 | I | 30 | U | 42 | . |
07 | 7 | 19 | J | 31 | V | 43 | / |
08 | 8 | 20 | K | 32 | W | 44 | : |
09 | 9 | 21 | L | 33 | X | ||
10 | A | 22 | M | 34 | Y | ||
11 | B | 23 | N | 35 | Z |
Luckily we have base45, which uses (you guessed it) 45 characters with 3 alphachars per 2 bytes. This gives $\log_{256}45 \times 4000 \approx 2745$ bytes. Compared to byte/bit input's 2270 bytes, alphanumeric has the clear advantage when encoding URLs.
While testing we quickly discovered that %
and spaces break the URL quite easily on Android phones and should therefore not be used. To solve these problems in our project, we removed %
and space from the characters. Leaving us with $\log_{256}43 \times 4000 \approx 2713$. The maths behind base45 can easily be altered to work with 4 characters less. Because $2^{16} = 65535$ and $41^3 = 68921$, so the encoding will fit anyway. However, you again lose some bytes in the process, and that is not what we want!
The 0-waste solution
We can imagine our QR code as one big number with a base of 43. We can generate this number from the bit stream of the data we want to encode into the QR code. As an example:
Binary data | Base45 | Integer (base10) | ZEM Base43 |
---|---|---|---|
The quick brown fox jumps over the lazy dog. | 8UADZCKFEOEDJOD2KC54EM-DX.CH8FSKDQ \$D.OE44E5\$CS44+8DK44OEC3EFGVC:1D |
302483057169017528329190763919643603 19677638192109839881622825365022377 81693262640684650930677706176554798 |
6OC.//EZ4H84U17UVJ*P4VLQUFH:D./WVMF ISF7X8PRJTGB.CCJMV0NVOQTK-UHT8 |
44 bytes | 66 alphanumeric | 106 numbers | 65 alphanumeric |
Every column contains the same data, but represented in a different way. Our base43 encoding outperforms default base45 by 1 character in this example.
Results
In summary, the results of encoding data into an URL are the following for each of the bases:
QR Input | Encoding | Formula | URL Length | Real Data |
---|---|---|---|---|
Binary | Base64 | $2953 \div 4 \times 3$ | 2953 | 2214 |
Alphanumerical | Base45 | $4000 \div 3 \times 2$ | 4000** | 2666* |
Alphanumerical | ZEM Base45 | $\log_{45}\times 4000$ | 4000** | 2745* |
Alphanumerical | ZEM Base43 | $\log_{43}\times 4000$ | 4000** | 2713 |
*Contains characters that break URLs
**Maximum characters scannable by iOS devices
This shows that ZEM Base43 maintains a clear advantage within the practical constraints of this exercise, gaining 47 bytes over base45 encoding. With these rules we can fit an absolute maximum of 2713 byes into a QR code while maintaining error correction and readability on all mobile devices. This might seem like a small improvement, but we must remember that off-the-shelve base45 is actually unsuited for URL composition due to the illegal characters. When comparing base64 (2214 bytes in a QR) to ZEM base43 (2713 bytes in a QR), our base43 implementation provides a clear and worthwhile implementation. If you would like to know more or want to get in touch with us, don't hesitate to mail us at info@zem.com.