Extracting iMessages from Backups – Part 1

The iOS Backup Challenge

Recently a friend offered up an iOS Backup challenge. She wanted to be able to preserve her iPhone SMS/iMessage conversations and she also wanted to be able to get a printed version of selected message streams. In addition, she wanted to be able to accomplish this on her Windows laptop. I took the challenge.

Since I knew little to nothing about iPhone message storage, I started to dig in. There are several good resources on the Internet that I came across. The best is iPhone backup reverse engineering. More about the structure of iPhone backups later.

Tools Needed
iTunes – Backing up an iPhone to your desktop or laptop requires iTunes. The format of the backup produced by iTunes has morphed over versions. At the time of this writing, version 12.10.3 is the latest version and the code and procedures fully support that version.
iTunes – https://itunes.apple.com

Notepad++ – Notepad++ is my go-to text editor. The version used is v7.7.1. While I didn’t use any plugins for this project, there are some good HTML integrations that would be beneficial. On the other hand, any good text editor will do, but the demands of this project are beyond the capabilities of Windows Notepad, for example. If you’ve already programmed in Python, you’ll use whatever editor you’re most comfortable with.
Notepad can be download from https://notepad-plus-plus.org

Python 3.7.4 – I developed several versions using C, C#, and Perl, but I liked the ease and flexibility of Python for this project. The portability of python makes it especially attractive for a solution like this; it will run on Windows or Mac OSX. Maybe Linux, but I’m not sure iTunes is available on Linux.

I’m sure there are going to be a lot of discussions around the version of Python to use. I chose version 3 because it is the latest GA version and I’m more comfortable with Python 3 than I am with 2.
Python – https://www.python.org

SQLiteStudio v3.2.1 – Apple stores all of the data I needed in one of several SQLite databases. The SQLiteStudio utility is the best tool I found for exploring and testing the various databases and tables. While I installed SQLite on my laptop, I’m 90% sure that you can accomplish everything discussed here without the SQLite binaries installed.
SQLiteStudio – https://www.sqlitestudio.pl

Chrome – My preferred browser is Chrome, because of the excellent development tools provided. I would not have been able to make the progress necessary to complete this without Chrome. I used Version 78.0.3904.87 (Official Build) (64-bit).
Chrome – https://www.chrome.com

Setting Up – Requirements

Install all five of the applications discussed above. Launch and validate each one, then launch iTunes and connect your iPhone once it loads.

Create a backup on your local PC/Mac. The backup must be on the local PC and set to be unencrypted (no password). I won’t go into the details of creating the backup, that process is well documented on the Internet if you need help.

There is no option for targeting a specific location to store the backup. On Windows, iTunes writes the backup to your profile and on Mac OSX it’s written to the home directory. The exact locations are as follows:

Windows: C:\Users\Apple\MobileSync\Backup\
Mac OSX: /Users/{username}/Library/Application Support/MobileSync/Backup

Backup Structure
Each iPhone backup has a unique named root directory. The directory name is a string of 40 hex characters. For example, mine is b5021a7823781e9c137578761572da791f848a84. As the backup’s root, all files, and directories reside inside/under it. Except for a few named files, all the files are in the backup use the same 40 hex character format.

Backup Root Directory

11/12/2019  02:10 PM    <Dir>          .
11/12/2019  02:10 PM    <Dir>          ..
11/09/2019  03:14 PM    <dir>          00
11/09/2019  03:14 PM    <dir>          01
11/09/2019  03:14 PM    <dir>          02
.
.
.
11/09/2019  03:14 PM    <dir>          fd
11/09/2019  03:14 PM    <dir>          fe
11/09/2019  03:14 PM    <dir>          ff
11/09/2019  03:15 PM         1,431,802 Info.plist
11/09/2019  03:14 PM        52,424,704 Manifest.db
11/09/2019  03:14 PM           109,293 Manifest.plist
11/09/2019  03:14 PM               189 Status.plist
               6 File(s)     53,998,756 bytes
             258 Dir(s) 103,184,080,896 bytes free

Backup Sub-Directories
Under the backup root directory are loads of subdirectories containing the backup files. Each directory name is two hex characters. The names of each file in a subdirectory start with the same two hex characters as the hosting directory.

Directory of C:\...\b5021a7823781e9c137578761572da791f848a84\3d

11/12/2019  02:10 PM    <dir>          .
11/12/2019  02:10 PM    <dir>          ..
10/07/2019  01:54 PM            37,970 3d04077678ca37f3ee965ecfde7ccf3ccaa5e396
11/09/2019  03:14 PM               308 3d07a74ca0c9b5a007440a8a8ce86572eae972b5
10/07/2019  01:54 PM           278,388 3d08ec7b9abc63092cbf8a9c18b8b2313edb3914
10/07/2019  01:54 PM            46,683 3d08f88e0bd6415a614348531c7cc0379dde1ce7
10/07/2019  01:54 PM            25,691 3d0c38fc5f0258e83f7e095c9a1b9e8c8097fc65
10/07/2019  01:54 PM            45,852 3d0d2b5c6764dbb6bfcf1f537a0f8f771e0aacd7
11/09/2019  03:11 PM         9,351,168 3d0d7e5fb2ce288813306e4d4636395e047a3d28
10/07/2019  01:52 PM                40 3d0dc904f08dfe3f3b8ece2ea795bec3494295c5
11/09/2019  03:14 PM             1,981 3d10cc9e9c332b10b82f9570a88f0739886b4845
10/07/2019  01:54 PM            38,169 3d11497f0431f5354de91511e05d48f107ce1bf2

For example, the file named 3d0d7e5fb2ce288813306e4d4636395e047a3d28 is in the 3d/ subdirectory. (first two hex characters match the two character directory name) In the case of this file, it’s actually the SQLite database (SMS.db) containing most of the tables and data needed for this errort. The filename is a hash of the file; file hashing is covered in more detail later.

Named Files
Under the backup root, there are four non-hashed, named files (Info.plist, Manifest.db, Manifest.plist, and Status.plist) None of these files are needed for this project, but you may want to use some of the tools of this project to explore them. They contain information on the files in the backup and information on the backups themselves.
SLQite Database Files
For many, the main challenge is understanding the iPhone backup is that the hashed file names provide no clue on what each one contains. The work of Rich Infante in reverse-engineering the backup got me moving in the right direction and therefore, saved countless hours of peeking under the hood of the many files. Another approach I used was to explore the Manifest.db database with SQLite Studio.  In Part 2, I’ll explore SQLite databases and walk through using SQLite Studio.

After attaching the database you can query the Files table and see both the hashed file name and the original filename with the path from the backed-up device (the iPhone). I’d suggest spending a little time exploring this table in SQLite Studio.

As pointed out earlier, the filename 3d0d7e5fb2ce288813306e4d4636395e047a3d28 is the SMS.db SQLite database. The different tables used to extract messages and attachments are highlighted in the Database view from SQLite Studio.

It may seem strange seeing this long hex filename containing the database. On the other hand, if you made a copy of the database file and renamed it to sms.db, it would seem more natural to open that file as a database. The bottom line is, to ignore the hex filenames and focus on the function of the file.

The only other database I needed to access was the contacts or address book database. The hex directory and filename is /31/31bb7ba8914766d4ba40d6dfb6113c8b614be442. I’ll be covering the extraction of needed data from these two databases later.

Text and BLOB Files
Think of all the different types of data you have in the SMS and iMessage conversations that go on with you and the people you communicate with. Obviously, there’s the text of the messages themselves, those cute little emoticons, gifs, and icons you add to messages. You may have audio files or images that you send via SMS and maybe even a VText of contact information.

All of that data gets stored in the backup and organized to allow restorig into the message stream later on. Other than the actual message text, everything is converted to a file and stored in the backup. So that picture of a sunset you sent to your sister is stored as a file named 0c1baecb5088a0ebe341dc80b3a2b79e8b733ba9. And the audio file you send to your mother saying ‘Happy birthday, Mom’ is named 6d07cd400f8ecba1c69fd526e001180530a1594e.

Everything is there, it just doesn’t stand out because of the 40 character hex file names. Later on, you’ll get to see how to identify and access the data you want and then how to automate the process.

PList and BPList Data
The last file types I want to cover are PList and BPList files. If you’re at all familiar with iOS or Swift programming, you’ve seen your share of these files. A PList is analogous to a Python dictionary object in a file format. BPLists are binary-PList files.

PLists are found in the various tables of the databases.  In some cases, hashed files are actually PLists underneath. I haven’t come across any instances where I needed to access the PList, but I did some exploring of the data in PList format just to become familiar with the contents. Python has a plistlib module that enables functionality to import, parse and print these files.

Hash Names
We talked about how all the filenames are 40 hex characters. I can only guess why Apple chose the hex formatting, but I came up with several reasons. First, the hashing/hex naming of longer file path/filename combinations efficiently shrinks and standardizes file names. The hash eliminates the need to completely replicate the directory structure used in the iPhone. Next, it sure makes indexing a lot easier. It also has benefits on how to efficiently organize files and disperse them in a relatively balanced distribution. Last, maybe Apple doesn’t want users to be mucking around with files they may recognize.

The file names are the result of applying a SHA1 hashing algorithm to the path and filename. The path I’m speaking of here is the path from the iPhone and not the path in the backup. For example, let’s take a sample file and hash it.
Original file info: Media/PhotoData/Thumbnails/V2/PhotoStreamsData/1309676283/101APPLE/IMG_1983.JPG/5003.JPG

Here is a bare-bones function to hash the file name:

import os
import os.path
import hashlib

def CalcFileHash(target):    # filepath passed as an attribute
   sha = hashlib.sha1()      # create a SHA1 hash object
   mde = target.encode()     # binary encode the string
   sha.update(mde)           # create the hash
   hsh = sha.hexdigest()     # return the hash string fro the SHA1 object
   return hsh                # have the function return the hash</pre>

We can pass this file path to the following function and print the result with:

filename = 'Media/PhotoData/Thumbnails/V2/PhotoStreamsData/1309676283/101APPLE/IMG_1983.JPG/5003.JPG'
print(CalcFileHash(filename))

The returned result is 692f2c4510c1b2f9b517e23c7ed8c355a50b54d9; you should be able to replicate this and get the same hash. The result is always unique. To see how a minor change can alter the returned hash, just change the file extension from .JPG to .JPEG. Now the function will return a0aeee0181a41d0a12bee926153b26a9f70fb292.

We’ll use this function to input an attachment file path and return the unique hashed filename later on.

That should be enough information to get us going. In Part 2 I’ll cover SQLite Studio and do a little exploring using the SQL Query tool.

About Tim Porter

Tim retired after over 30 years in various technology roles. He's worked in application development, infrastructure, database management and network engineering. In his spare time, Tim also also dables in electronics and microcontroller programming.