Taking Sim City 2000 into pieces

SimCity 2000 (Maxis, 1993) is one of my superfavourite games, ever. I’ve been playing it for 20 years and it’s partially responsible of my terrible grades at high school. I have always liked modifying games, but so far I haven’t been serious about decoding the data files of this city simulator. And I have found some quite interesting things!

There were ports in a great number of platforms, from the Macintosh (the original) to GameBoy Advance, but my favourite is MS-DOS, and it’s what this article is about. There are two interesting files: the executable (SC2000.EXE) and the data file (SC2000.DAT). Unfortunately, Windows version didn’t came out in Spanish (my mother language), and Network Edition version works awfully bad (and it is available only in English).

SC2000.EXE

The game executable does not seem to have many resources, but it does have some interface texts. In the hexadecimal editor we can see some fixed-width labels.

There are some texts representing variable-width labels, too. There are also some embedded files, in which for example game scenarios are described.

At the moment disassembling an executable from the early 90s is not one of my specialities, and thus I have not find out much. Pointers are not evident in the executable, so I left it alone. But the most interesting is the data files we are describing down below.

SC2000.DAT

This is the main data file. It does not have a header, but a short look in the hexadecimal editor throw some hints about its structure.

It turns out that from byte 0 the first we can find is 16-byte blocks that describe files contained in the pack. The first field is evident: 12 bytes with the file name (8 + 3 characters of the MS-DOS file with the period). If there is leftover space, the rest of the bytes are 00h.

The other two bytes are not so evident, but it turns out that the game is originary from Macintosh, which at that time used Motorola processors. These processors, unlike Intel‘s, are Little-Endian (being Intel Big-Indian). This means that numbers with more than one byte are stored ordered from the least significant byte to the most significant, instead of the “natural” ordering. This is thus a 32 bit unsigned integer in little-endian format, which codifies the offset of the file just before.

I owe the happy idea about the offset to Brett Lajzer, an Albany software engineer that began researching this file before me. I wrote him to swap information and advised me about this point, which he didn’t finally describe in his article.

Unpacking and packing, knowing this, is relatively simple. I’ve written a small Java program that makes this operation easy:

package sce2000;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;

import org.apache.commons.io.FileUtils;

public class Sce2000 {
	
	public static final String SC2000DAT = "G:\\dos\\sce2000\\sc2000.dat";
	public static final int NUMFILES = 399;
	
	public static class Filestrut implements Serializable {
		private static final long serialVersionUID = 1L;
		public String filename;
		public int offset;
		public int targetOffset;
	}

	public static void main(String[] args) throws IOException, ClassNotFoundException {
		
		if (args.length == 0) {
			info();
		} else if ("x".equals(args[0])) {
			extract();
		} else if ("c".equals(args[0])) {
			create();
		} else {
			info();
		}

	}
	
	public static void info() {
		System.out.println("Usage: x to eXtract or c to Create (after eXtract) + sc2000.dat file");
	}

	public static void create() throws IOException, ClassNotFoundException {
		
		File sc2000dat = new File(SC2000DAT);
		File metafile = new File(SC2000DAT + "!/meta");
		ObjectInputStream ois = new ObjectInputStream(new FileInputStream(metafile));
		List<Filestrut> files = (List<Filestrut>) ois.readObject();
		
		List<Byte> targetFile = new ArrayList<Byte>();

		for (Filestrut file : files) {
			byte[] fileNameBytes = file.filename.getBytes(StandardCharsets.US_ASCII);
			for (Byte myByte : fileNameBytes) {
				targetFile.add(myByte);
			}
			for (int i = 0; i < 16 - fileNameBytes.length; i++) {
				targetFile.add((byte) 0);
			}
			//sourceFile.length();
		}
		
		int i = 0;
		for (Filestrut file : files) {
			byte[] binary = Files.readAllBytes(Paths.get(SC2000DAT + "!/" + file.filename));
			int fileOffset = targetFile.size();
			for (Byte myByte : binary) {
				targetFile.add(myByte);
			}
			int filePointer = 12 + (16 * i);
			
			byte[] offsetBytes = fromInt(fileOffset);
			targetFile.set(filePointer, offsetBytes[0]);
			targetFile.set(filePointer + 1, offsetBytes[1]);
			targetFile.set(filePointer + 2, offsetBytes[2]);
			targetFile.set(filePointer + 3, offsetBytes[3]);
			
			i++;
		}
		
		byte[] binaryFile = new byte[targetFile.size()];
		for (i = 0; i < targetFile.size(); i++) {
			binaryFile[i] = targetFile.get(i);
		}
		
		FileUtils.writeByteArrayToFile(sc2000dat, binaryFile);

		
		System.out.println();
		
	}
		

	public static void extract() throws IOException {
		
		Path sc2000dat = Paths.get(SC2000DAT);
		byte[] data = Files.readAllBytes(sc2000dat);
		
		int vector = 0;
		List<Filestrut> files = new ArrayList<Filestrut>();
		for (int i = 0; i < NUMFILES; i++) {
			
			Filestrut currfile = new Filestrut();
			
			currfile.filename =  convertFilename(Arrays.copyOfRange(data, vector, vector + 12));
			vector += 12;
			
			currfile.offset = fromByteArray(Arrays.copyOfRange(data, vector, vector + 4));
			vector += 4;
			
			System.out.println("Found file " + currfile.filename + " at " + currfile.offset);
			
			files.add(currfile);
			
		}
		
		File dir = new File(SC2000DAT + "!");
		if (dir.exists()) {
			FileUtils.deleteDirectory(dir);
		}
		
		dir.mkdir();
		
		Iterator<Filestrut> fileIt = files.iterator();
		
		Filestrut file = fileIt.next();
		Filestrut fileNext = null;
		
		boolean stop = false;
		while (!stop) {
			
			File extracted = new File(SC2000DAT + "!/" + file.filename);
			
			int init = file.offset;
			int end = -1;
			if (fileIt.hasNext()) {
				fileNext = fileIt.next();
				end = fileNext.offset;
			} else {
				end = data.length;
				stop = true;
			}

			System.out.println("Writing: " + extracted.getAbsolutePath());
			FileUtils.writeByteArrayToFile(extracted, Arrays.copyOfRange(data, init, end));
			
			file = fileNext;
			
		}
		
		File metafile = new File(SC2000DAT + "!/meta");
		
		ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(metafile));
		oos.writeObject(files);
		oos.close();
		
		System.out.println("OK!");
	}
	
	public static byte[] fromInt(int number) {
		byte[] bytes = ByteBuffer.allocate(4).putInt(number).array();
		swapEndianess(bytes);
		return bytes;
	}
	
	public static void swapEndianess(byte [] bytes) {
		byte temp1 = bytes[0];
		byte temp2 = bytes[1];
		bytes[0] = bytes[3];
		bytes[1] = bytes[2];
		bytes[2] = temp2;
		bytes[3] = temp1;
	}
	
	public static int fromByteArray(byte[] bytes) {
		// Change endianness
		swapEndianess(bytes);
		return ByteBuffer.wrap(bytes).getInt();
	}
	
	public static String convertFilename(byte[] data) {
	    StringBuilder sb = new StringBuilder(data.length);
	    for (int i = 0; i < data.length; ++ i) {
	        if (data[i] < 0) { 
	        	throw new IllegalArgumentException();
	        }
	        if (data[i] == 0) {
	        	break;
	        }
	        sb.append((char) data[i]);
	    }
	    return sb.toString();
	}


}

This program has been done quick and dirty and it’s not what we would say optimized. I indulged myself loading the whole files in memory as the complete DAT file is few megabytes big. Also it has the file count and DAT path hardcoded in the code. Making it better and more efficient is left as an exercise to the reader.

Why in Java? It’s not the most appropriate for handling binaries, and not having unsigned types support doesn’t precisely help. I simply did it in Java and not in C because it’s the language I use for a living, and the one I’m the most fluent with. Also I’ve not programmed anything in C for 10 years. 🙂

Using the above code (or any other the intrepid reader can code) we can see the following file types:

  • RAW, as the header-less image files.
  • PAL, as the colour palette used in the game.
  • Text files, some as TXT* without an extension and others with RAW extension.
  • XMI, with the music.
  • VOC, with the sound effect files.
  • FNT, with the typography files.
  • Etc, etc…

Let’s talk about them a bit one by one.

VOC files

The game sound files are stored in VOC files. Well, that’s one is easy, as it’s not a common file format but broadly supported on many sound edition programs. It’s a format by Creative Labs that chiefly stores PCM and ADPCM coded audio, although sometimes has been used for other encodings.

It can be perfectly open and saved with Adobe Audition. The sampling frequency varies from one file to another, but is recognized without many problems.

Text files

There are three kinds of text file types. The most simple are TXT* files without an extension, being * a number. They can be directly opened with Notepad++ and be edited without problems.

Well, not quite. Encoding is not ASCII, nor UNICODE because it wasn’t common at the time. Which one, then? Well, being the game was programmed for Macintosh, encoding is Macintosh Roman. This is a problem because Notepad++ does not support it. It’s been asked for, but seems not the priority, so you might want to convert it somehow to edit it more comfortably.

Addendum: It supports it after all! The option was buried in the menus: Encoding > Character Sets > Cyrillic > Macintosh.

Other text files are STR*.RAW and *.RAW, being * a number. Their format is not so nice.

So the first byte is always 00h, and the second seems to store the string count in the text. Strings, unlike in C, aren’t coded terminated in 00h. They are stored preceeded by its length, coded in 1 byte. Ideally we could use an utility software for this, which could easily be coded. (but I don’t feel like to)

The third text file kind correspond only to PPDT1003.RAW file, which has one of the funniest characteristics in all SimCity 2000: the newspapers.

Unfortunately this format is a nightmare. It begins with a term dictionary separated by 00h, and then a clump of texts used to generate the articles. This custom encoding is variable-width (for an example, 5C96h represent the ñ character) and uses the previously defined terms, and of course placeholders for article subjects. These articles are procedurally generated, and an article denouncing the disappearance of an animal had as the main protagonist a cat or a rhinoceros, owned by mrs. Dwight or mr. Martínez.

Back in the day we had editors like Thingy, which supported TBL files. You could specify a term dictionary there and easily edit files like this. Anyway, it would only be of use if we could keep the string length, and even then we would need to decipher the preceding number (and its encoding) so we could properly edit it.

Addendum: It seems the pointers for the newspaper texts are in the PPDT1004.RAW file. They consist in blocks of 4 bytes in big-endian. What a mess.

Image files

Excluding some text files (described in above), RAW files store the images used in the game interface. Buildings and other elements are in some DAT files that would be out of the scope of this article, because they can be edited much more easily with the SimCity Urban Renewal Kit (SCURK), a tool included in some later versions which could edit cities “by hand” and modify the graphical aspect of the buildings.

Now about the files in question, these RAW files are 8 bit (256 colours) bitmaps with a 4 byte header. It doesn’t seem to be relevant information about resolution or colours, but for that matter we have another file, MINE.PAL, which specifies the colour palette used throughout the game. It’s not a Microsoft Palette file as the extension suggest, but a raw palette ACT type. Interestingly, it’s the preferred format used in Adobe Photoshop.

So for the resolution, there seems not to be an easy way to find it out simply looking at the file. The most easy method is factorizing the number (minus 4 bytes) in two factors, something Photoshop could help us with when opening the file, at least in the most modern version. By the file name we can more or less tell what’s the file for, and in case we have played (you really should!) the game, you could more or less recognize its width/height ratio. Some are easy: the title screen (TITLE.RAW) is a 640×480 image. All the images use the same palette, as the game only uses the same 256 colours all of the time.

I always use the same technique editing 8 bit indexed colour images: I convert them to 24 bit colour (plus 8 bit alpha channel) and I edit them comfortably. After that, before saving them, I change them to indexed color loading again the game palette. Surprisingly enough, saving the files in Adobe RAW format, the game loads it right, because Photoshop would allow us to preserve the 4 byte header. What a relief!

Please note: needless to say this RAW format has anything to do with raw image formats the nowadays digital cameras uses.

XMI files

Game music tracks are stored in the XMI files. Tis format supports many tracks per file, but this seems not the case. The structure is very different from MIDI files, but they do about the same: store musical notes and events. It can be played in Windows easily with Foobar2000, and it is possible to convert MIDI to XML (and vice versa). MIDIPLEX seems to be able to do it from Windows, but no compiled binaries are offered so I did it with older tools with MS-DOS, using DOSBox. Anyway, the very SimCity 2000 will need DOSBox to work in modern systems.

FNT files

They are presumed to be the game typographies, to judge from the numbers and the extension they come with. It seems not the same format that Windows uses (bitmap FNT), so I’ve been unable to open the files with any editor.

Others

Other files present in the package are General MIDI sources for OPL chipsets, some indexes and headers for the building graphic sets… anything very interesting when modifying the game. Everything else seems to be embedded in the executable file, something it’s out of my reach at the moment.

In Windows

On a point of information, the Windows version can be modified much more easily. The image and sound resources are present in WAV and BMP format, which are much less obscure as the previously described. The rest of the resources are embedded in the game executable, but with a resource editor they can be extracted and modified with ease, and in more accessible format than their MS-DOS counterparts. (I use Resource Hacker, which is free and works quite well)

Newspapers, however, seems to have the same calamitous format than the MS-DOS version. It’s a pity, because it would be very interesting to translate this version to Spanish.

So this is where I leave it. I hope it’s been educative, although after all this time there seems to be little interest modifying this game. I’ve always wanted to meddle with this game and create my own mod, Sim City 2000 effect, a bit mischievous, but maybe that’ll be in another time.

2 thoughts on “Taking Sim City 2000 into pieces”

  1. Hey man, really nice investigation here. Did you ever continue this and perhaps make a nice unpacker-repacker tool with GUI and one click everything? 🙂 Was thinking it would be nice to change the gfx of the road tiles somehow.

    1. Thanks man, though I don’t think it is worth anyone’s time unless someone figures the newspapers out. For editing tiles (roads, buildings, etc) you are better of with the Urban Renewal Kit.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.