Hello, PNG!

By David Buchanan, 16th January 2023

PNG is my favourite file format of all time. Version 1.0 of the specification was released in 1996 (before I was born!) and the format remains widely used to this day. I think the main reasons it stuck around for so long are:

  • It's "Good enough" at lossless image compression.
  • It builds on existing technologies (zlib/DEFLATE compression).
  • It's simple to implement (helped by the above point).
  • It supports a variety of modes and bit-depths, including "true color" (24-bit RGB) and transparency.
  • It isn't patented.

There are other similarly-old and similarly-ubiquitous formats (cough ZIP cough) that are disgusting to deal with due to legacy cruft, ad-hoc extensions, spec ambiguities, and mutually incompatible implementations. On the whole, PNG is not like that at all, and it's mostly due to its well-thought-out design and careful updates over the years.

I'm writing this article to fulfil my role as a PNG evangelist, spreading the joy of good-enough lossless image compression to every corner of the internet. Similar articles already exist, but this one is mine.

I'll be referencing the Working Draft of the PNG Specification (Third Edition) released in October 2022 (!), but every feature I mention here should still be present in the 1.0 spec. I'll aim to update this article once the Third Edition releases officially.

Writing a PNG File

I think the best way to get to grips with a file format is to write code for reading or writing it. In this instance we're going to write a PNG, because we can choose to focus on the simplest subset of PNG features.

A minimum-viable PNG file has the following structure:

PNG signature || "IHDR" chunk || "IDAT" chunk || "IEND" chunk

The PNG signature (aka "magic bytes") is defined as:

"89 50 4E 47 0D 0A 1A 0A" (hexadecimal bytes)

Or, expressed as a Python bytes literal:

1
b'\x89PNG\r\n\x1a\n'

These magic bytes must be present at the start of every PNG file, allowing programs to easily detect the presence of a PNG.

PNG Chunks

After the signature, the rest of the PNG is just a sequence of Chunks. They each have the same overall structure:

Length      - A 31-bit unsigned integer (the number of bytes in the Chunk Data field)
Chunk Type  - 4 bytes of ASCII upper or lower-case characters
Chunk Data  - "Length" bytes of raw data
CRC         - A CRC-32 checksum of the Chunk Type + Chunk Data

PNG uses Network Byte Order (aka "big-endian") to encode integers as bytes. "31-bit" is not a typo - PNG defines a "PNG four byte integer", which is limited to the range 0 to 231-1, to defend against the existence of C programmers.

If you're not familiar with these concepts, don't worry - Python will handle all the encoding for us.

The Chunk Type, in our instance, will be one of IHDR, IDAT, or IEND (more on these later).

The CRC field is a CRC-32 checksum. The spec gives a terse mathematical definition, but we can ignore all those details and use a library to handle it for us.

The meaning of data within a chunk depends on the chunk's type, and potentially, context from prior chunks.

Putting all that together, here's a Python script that generates a vaguely PNG-shaped file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import zlib

# https://www.w3.org/TR/2022/WD-png-3-20221025/#5PNG-file-signature
PNG_SIGNATURE = b'\x89PNG\r\n\x1a\n'

# https://www.w3.org/TR/2022/WD-png-3-20221025/#5Chunk-layout
def write_png_chunk(stream, chunk_type, chunk_data):
	# https://www.w3.org/TR/2022/WD-png-3-20221025/#dfn-png-four-byte-unsigned-integer
	chunk_length = len(chunk_data)
	if chunk_length > 2**31 - 1:  # This is unlikely to ever happen!
		raise ValueError("This chunk has too much chonk!")
	
	# https://www.w3.org/TR/2022/WD-png-3-20221025/#5CRC-algorithm
	# Fortunately, zlib's CRC32 implementation is compatible with PNG's spec:
	crc = zlib.crc32(chunk_type + chunk_data)

	stream.write(chunk_length.to_bytes(4, "big"))
	stream.write(chunk_type)
	stream.write(chunk_data)
	stream.write(crc.to_bytes(4, "big"))


if __name__ == "__main__":
	"""
	This is not going to result in a valid PNG file, but it's a start
	"""

	ihdr = b"\0" * 13  # TODO: populate real values!
	idat = b""  # ditto

	with open("samples/out_0.png", "wb") as f: # open file for writing
		f.write(PNG_SIGNATURE)
		write_png_chunk(f, b"IHDR", ihdr)
		write_png_chunk(f, b"IDAT", idat)
		write_png_chunk(f, b"IEND", b"")

The write_png_chunk() function is complete and fully functional. However, we don't have any real data to put in the chunks yet, so the script's output is not a valid PNG.

Running the unix file tool against it gives the following output:

$ file samples/out_0.png 
samples/out_0.png: PNG image data, 0 x 0, 0-bit grayscale, non-interlaced

It correctly recognises a PNG file (due to the magic bytes), and the rest of the summary corresponds to the 13 zeroes I packed into the IHDR chunk as a placeholder. Since we haven't populated the chunks with any meaningful data yet, image viewers will refuse to load it and give an error (there is nothing to load!).

Image Input

Before we continue, we're going to need some actual image data to put inside our PNG. Here's an example image I came up with:

Funnily enough, it's already a PNG file, but we don't have a way to read PNGs yet - how can we get the pixel data into our script? One simple method is to convert it into a raw bitmap, which is something ImageMagick can help us with. I used the following command:

$ convert ./samples/hello_png_original.png ./samples/hello_png.rgb

hello_png.rgb now contains the raw uncompressed RGB pixel data, which we can trivially read as-is from Python. For every pixel in every row, it stores a 3-byte value corresponding to the colour of that pixel. Each byte is in the range 0-255, corresponding to the brightness of each RGB sub-pixel respectively. To be pedantic, these values represent coordinates in the sRGB colourspace, but that detail is not strictly necessary to understand.

This .rgb file isn't a "real" image file format, and we need to remember certain properties to be able to make sense of it. Firstly we need to know the width and height (in this case 320x180), the pixel format (24-bit RGB, as described above), and the colourspace (sRGB). The PNG file that we generate will contain all this metadata in its headers, but since the input file doesn't contain them, we will hardcode the values in our Python script.

The IHDR (Image Header) Chunk

The IHDR Chunk contains the most important metadata in a PNG - and in our simplified case, all the metadata of the PNG. It encodes the width and height of the image, the pixel format, and a couple of other details:

Name                Size

Width               4 bytes
Height              4 bytes
Bit depth           1 byte
Colour type         1 byte
Compression method  1 byte
Filter method       1 byte
Interlace method    1 byte

There isn't much to say about it, but here's the relevant section of the spec.

I mentioned earlier that our RGB values are in the sRGB colourspace. PNG has ways to signal this information explicitly (through "Ancilliary Chunks"), but in practice, sRGB is assumed to be the default, so for our minimum-viable PNG implementation we can just leave it out. Colour spaces are a complex topic, and if you want to learn more I recommend watching this talk as an introduction: Guy Davidson - Everything you know about colour is wrong

The IDAT (Image Data) Chunk

The IDAT chunk contains the image data itself, after it's been Filtered and then Compressed (to be explained shortly).

The data may be split over multiple consecutive IDAT chunks, but for our purposes, it can just go in one big chunk.

The IEND (Image Trailer) Chunk

This chunk has length 0, and marks the end of the PNG file. Note that a zero-length chunk must still have all the same fields as any other chunk, including the CRC.

Filtering

The idea of filtering is to make the image data more readily compressible.

You may recall that the IHDR chunk has a "Filter method" field. The only specified filter method is method 0, called "adaptive filtering" (the others are reserved for future revisions of the PNG format).

In Adaptive Filtering, each row of pixels is prefixed by a single byte that describes the Filter Type used for that particular row. There are 5 possible Filter Types, but for now, we're only going to care about type 0, which means "None".

If we had a tiny 3x2 pixel image comprised of all-white pixels, the filtered image data would look something like this: (byte values expressed in decimal)

0   255 255 255  255 255 255  255 255 255
0   255 255 255  255 255 255  255 255 255

I've added whitespace and a newline to make it more legible. The two zeroes at the start of each row encode the filter type, and the "255 255 255"s each encode a white RGB pixel (with each sub-pixel at maximum brightness).

This is the simplest possible way of "filtering" PNG image data. Of course, it doesn't do anything especially useful since we're only using the "None" filter, but it's still a requirement to have a valid PNG file. I've implemented it in Python like so:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# This is all the code required to read subpixel values from an ".rgb" file.
# subpixel 0=R, 1=G, 2=B
def read_rgb_subpixel(rgb_data, width, x, y, subpixel):
	return rgb_data[3 * ((width * y) + x) + subpixel]

# Note: This function assumes RGB pixel format!
# Note: This function could be written more concisely by simply concatenating
# slices of rgb_data, but I want to use approachable syntax and keep things
# abstracted neatly.
def apply_png_filters(rgb_data, width, height):
	# we'll work with an array of ints, and convert to bytes at the end
	filtered = []
	for y in range(height):
		filtered.append(0) # Always filter type 0 (none!)
		for x in range(width):
			filtered += [
				read_rgb_subpixel(rgb_data, width, x, y, 0), # R
				read_rgb_subpixel(rgb_data, width, x, y, 1), # G
				read_rgb_subpixel(rgb_data, width, x, y, 2)  # B
			]
	return bytes(filtered)

Compression

Once the image data has been filtered, it needs to be compressed. You may recall that the IHDR chunk has a "Compression method" field. The only compression method specified is method 0 - a similar situation to the Filter Method field. Method 0 corresponds to DEFLATE-compressed data stored in the "zlib" format. The zlib format adds a small header and a checksum (adler32), but the details of this are outside the scope of this article - we're just going to use the zlib library (part of the Python standard library) to handle it for us.

If you do want to understand the intricacies of zlib and DEFLATE, check out this article.

Implementing this in Python is dead simple:

1
idat = zlib.compress(filtered, level=9) # level 9 is maximum compression!

As noted, level 9 is the maximum compression level offered by the zlib library (and also the slowest). Other tools such as zopfli can offer even better compression ratios, while still conforming to the zlib format.

Putting it all Together

Here's what our minimum-viable PNG writer looks like in full:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import zlib

# https://www.w3.org/TR/2022/WD-png-3-20221025/#5PNG-file-signature
PNG_SIGNATURE = b'\x89PNG\r\n\x1a\n'

# https://www.w3.org/TR/2022/WD-png-3-20221025/#dfn-png-four-byte-unsigned-integer
# Helper function to pack an int into a "PNG 4-byte unsigned integer"
def encode_png_uint31(value):
	if value > 2**31 - 1:  # This is unlikely to ever happen!
		raise ValueError("Too big!")
	return value.to_bytes(4, "big")

# https://www.w3.org/TR/2022/WD-png-3-20221025/#5Chunk-layout
def write_png_chunk(stream, chunk_type, chunk_data):
	# https://www.w3.org/TR/2022/WD-png-3-20221025/#5CRC-algorithm
	# Fortunately, zlib's CRC32 implementation is compatible with PNG's spec:
	crc = zlib.crc32(chunk_type + chunk_data)

	stream.write(encode_png_uint31(len(chunk_data)))
	stream.write(chunk_type)
	stream.write(chunk_data)
	stream.write(crc.to_bytes(4, "big"))

def encode_png_ihdr(
		width,
		height,
		bit_depth=8,           # bits per sample
		colour_type=2,         # 2 = "Truecolour" (RGB)
		compression_method=0,  # 0 = zlib/DEFLATE (only specified value)
		filter_method=0,       # 0 = "adaptive filtering" (only specified value)
		interlace_method=0):   # 0 = no interlacing (1 = Adam7 interlacing)

	ihdr = b""
	ihdr += encode_png_uint31(width)
	ihdr += encode_png_uint31(height)
	ihdr += bytes([
		bit_depth,
		colour_type,
		compression_method,
		filter_method,
		interlace_method
	])

	return ihdr

# This is all the code required to read subpixel values from an ".rgb" file.
# subpixel 0=R, 1=G, 2=B
def read_rgb_subpixel(rgb_data, width, x, y, subpixel):
	return rgb_data[3 * ((width * y) + x) + subpixel]

# Note: This function assumes RGB pixel format!
# Note: This function could be written more concisely by simply concatenating
# slices of rgb_data, but I want to use approachable syntax and keep things
# abstracted neatly.
def apply_png_filters(rgb_data, width, height):
	# we'll work with an array of ints, and convert to bytes at the end
	filtered = []
	for y in range(height):
		filtered.append(0) # Always filter type 0 (none!)
		for x in range(width):
			filtered += [
				read_rgb_subpixel(rgb_data, width, x, y, 0), # R
				read_rgb_subpixel(rgb_data, width, x, y, 1), # G
				read_rgb_subpixel(rgb_data, width, x, y, 2)  # B
			]
	return bytes(filtered)


if __name__ == "__main__":
	# These values are hardcoded because the .rgb "format" has no metadata
	INPUT_WIDTH = 320
	INPUT_HEIGHT = 180
	# read entire file as bytes
	input_rgb_data = open("./samples/hello_png.rgb", "rb").read()

	ihdr = encode_png_ihdr(INPUT_WIDTH, INPUT_HEIGHT)

	filtered = apply_png_filters(input_rgb_data, INPUT_WIDTH, INPUT_HEIGHT)

	# Apply zlib compression
	idat = zlib.compress(filtered, level=9) # level 9 is maximum compression!

	with open("samples/out_1.png", "wb") as f: # open file for writing
		f.write(PNG_SIGNATURE)
		write_png_chunk(f, b"IHDR", ihdr)
		write_png_chunk(f, b"IDAT", idat)
		write_png_chunk(f, b"IEND", b"")

That's only 87 lines of liberally commented and spaced-out Python code. If we run it, we get this output:

It's... exactly the same as the one I showed earlier, which means it worked! We made a PNG from scratch! (Well, not quite from scratch - we used zlib as a dependency).

Verifying it using the pngcheck utility results in the following:

$ pngcheck ./samples/out_1.png 
OK: ./samples/out_1.png (320x180, 24-bit RGB, non-interlaced, 15.6%).

Looks good! Now let's have a look at some file sizes:

hello_png_original.png       128286 bytes
hello_png.rgb                172800 bytes
out_1.png                    145787 bytes

We started off with a 128286-byte PNG file, exported from GIMP using the default settings.

We converted it to a raw RGB bitmap using ImageMagick, resulting in 172800 bytes of data. Taking this as the "original" image size, that means GIMP's PNG encoder was able to compress it to 74% of its original size.

Our own PNG encoder only managed to compress it down to 145787 bytes, which is 84% of the original size. How did we end up 10% worse?

It's because we cheaped out on our Filtering implementation. GIMP's encoder chooses a filter type for each row adaptively, probably based on heuristics (I haven't bothered looking at the specifics). If we implemented the other filter types, and used heuristics to pick between them, we'd probably get the same or better results as GIMP. This is an exercise left to the reader - or maybe a future blog post from me!

As a quick example, Adaptive Filter type 2 subtracts the byte values of the pixel above from those of the "current" pixel. If one row was identical (or similar) to the row above it, the filtered version of that row would compress very efficiently (because it would be all or mostly zeroes).

Full source code and example files are available on my Git repo: https://github.com/DavidBuchanan314/hello_png

Things I Didn't Mention

Things I didn't mention in this article, which you may still want to know, include:

  • Support for other bit-depths.
  • Indexed colour (i.e. using a palette)
  • Further metadata, and other chunk types.
  • Interlacing.
  • The other filter types.
  • APNG.

...and probably a few other things I forgot. I might update this list when I remember them.

PNG Debugging Tips

If you're trying to generate or parse your own PNGs and running into opaque errors, here are a couple of tips.

Try using ImageMagick to convert the PNG into another format (the destination format doesn't matter). This is useful because it gives specific errors about what went wrong. For example, if I try to convert the initial out_0.png image we generated (which had the basic file structure but no data), we get this:

$ convert samples/out_0.png /tmp/bla.png
convert: insufficient image data in file `samples/out_0.png' @ error/png.c/ReadPNGImage/4270.

This error makes sense, because IDAT was empty. You could probably track down the specific line of png.c if you wanted even more details.

My next tip is to try using an advanced hex-editor like ImHex to inspect the file. ImHex supports a "pattern" for PNG, which effectively gives you byte-level syntax highlighting, as well as letting you view the parsed structures of the file.

Related Materials and PNG Tricks

I recently made a PNG/MD5 hashquine which various people wrote about and discussed, including myself (I do plan on writing a proper blog post on it, eventually).

I also found a bug in Apple's PNG decoder, due to a poorly thought-out proprietary extension they made to the format. They've since fixed that instance of the bug, although it's still possible to trigger it using a slightly different approach. There was also related discussion and articles.

I made a proposal for a backwards-compatible extension to the PNG file format that enables the PNG decoding process to be parallelised. Others have made similar proposals, and it is likely that some variation will make it into a future version of the official PNG specification.

I found an edge-case in Twitter's image upload pipeline that allows PNG/ZIP polyglot files to be hosted on their CDN. Related article. I abused the same trick to upload web-streamable 4K 60fps video (a feature Twitter is yet to officially support!).

PNG also supports "Adam7" interlacing, which I abused to create a crude form of animated PNG (without using APNG, heh). Related discussion.

Maybe now you believe me when I say it's my favourite file format?