Extracting iOS iMessages – Part 3

Loading Messages

This is Part 3 of the iOS Backup and SMS Messages series for building a tool to export iMessage/SMS data from an iTunes backup of your iPhone. In the Part 1 we exposed the structure and data organization of the iPhone backup. We also identified the needed tools for extracting and displaying iMessage/SMS data. In Part 2, we connected to the sms.db in SQLite Studio and built a query to view message data.

In this installment we’ll build the Python application framework and get started on automating the extraction and working storage of message data. I’m assuming that you have the required Python 3.x binaries installed and have a basic skill level using a text editor.

Before moving on, these are a few key concepts needed for this part of the series:

Chat – A chat is a conversation thread of one or more messages.

  • For example, if Joe and Frank exchange a SMS chat with one message, that is a chat.
  • If they exchange another message three weeks later, that is part of the same chat.
  • If 10 of your friends are involved in a SMS conversation over a week with dozens of messages, that is also a chat.

The sms.db database keeps track of participants in the chat, messages and other data. Each chat has a unique chatID.

Message – A message is a single sms or iMessage.

  • There is a 1:many relationship between chats and messages.
  • A message is associated with ONLY a single chat.

Validate Python Environment
When I work with Python I use a dedicated project directory and I keep a command shell open for testing as I move along. First things first – verify that Python is installed and functioning properly. Create your working directory – I’m using D:\work\ios. Then open a command prompt and make the work directory current (> cd D:\work\ios).

At the command prompt enter python -V and you should get the Python version of your installed binaries.

D:\Work\ios> python -V
Python 3.7.4

D:\Work\ios> 

If you don’t have Python 3 available, go back to Part 1 and install all the prerequisites.

Let’s Start Programming

With that, we can open the text editor and get started. As I mentioned in the Part 1, I use Notepad++ as my preferred editor, but you can use whatever editor you’re comfortable with.

Create and save a new file as iphoneb.py. Thisd will be a relative basic application with only a few functions, so everything will reside in this single file. We’re going to need a few libraries so at the top of the file add the following import statements:

import os
import os.path
import sqlite3
import hashlib
import datetime
import plistlib

Note – The last immport (plistlib) is optional. This library is needed if you want to explore any of the PList data in the tables.

I use a few global variables which gets some of my programming friends stirred up and ranting about using classes and planning an intervention. For this effort, it just makes it much simpler.

_curBackupRoot – the location of theiPhone backup on your pc

_outRoot – – the location we’re going to save the HTML output to

D_MSG, D_CTX, LS_ATCH, D_CMJ, and LS_CALLER – These five variables are Python dictionaries where we’ll be storing messages, contacts, attachment data, chat/message lookup data and chat participants.

_curBackupRoot = ‘C:/Users/[your profile]/Apple/MobileSync/Backup/[your backp root]/’
#Windows location – edit for your profile
#curBackupRoot = ‘~/Library/Application Support/MobileSync/Backup/[your backup dir]/’
# Mac OSX location – edit for your home

_outRoot = ‘d:/work’ # edit for your desired output location

D_MSG = {}
D_CTX = {}
LS_ATCH = {}
D_CMJ = {}
LS_CALLER = {}

def LoadMessages():
   print(‘starting LoadMessages()…’)

The last two lines are the first function. In LoadMessages() we’ll query the SMS database and store the individual message data in the D_MSG dictionary. Below that, at the bottom of the file is the entry point where we just call LoadMesssages() for now.

#=============================================================================
# Main program entry point
#-----------------------------------------------------------------------------
print('Starting...')
LoadMessages()

 

The first task is to query the database. So switch back over to SQLite Studio and we’ll build and test the query we need. Then we need to add that to the application as a string variable we can use to allow Python to generate the same query.

We’ve already tested the database connection and explored the database a bit. Open a new query window and paste the following:

SELECT
   m.rowid as RowID,
   h.id AS UniqueID,
   is_from_me AS Type,
   date AS Date,
   date AS Epoch,
   text AS Text,
   cache_roomnames AS Room,
   maj.attachment_id AS AttachmentID,
   expire_state as expState,
   is_audio_message AS isAudio,
   is_corrupt

FROM message m
   LEFT JOIN handle h ON h.rowid = m.handle_id
   LEFT JOIN message_attachment_join maj ON maj.message_id = m.rowid
ORDER BY RowID

 

Thsi query will give us everything we need from the message table, but we need to do some cleanup before we can use the data. Using the snipet from my data, let’s go through the returned data and how we’ll use it.

  • RowID is the unique MessageID.
  • UniqueID is the ID of the message sender.
  • Type (is_from_me) is 0 or 1 – If I sent the message: 1 and if I received the message: 0
  • Date – the datetime the message was sent
  • Epoch – the date the message was sent. This will help us group messages by date in the UI.
  • Text – the message text.
  • Room – an identifier of how multi-participant chats are grouped.
  • AttachmentID – a unique ID for an attachment to a message
  • expState – for audio message attachments – has it expires 0 or 1
  • isAudio – 0 or 1. An audio message is 1, text only is 0
  • is_corrupt – a 0 or 1 flag so we can ignore corrupt messages.

Let’s start with the message type because that 0 or 1 is not real helpful. We want to know if the message is a sent item or a received message, so change that section to:

CASE is_from_me
   WHEN 0 THEN 'received'
   WHEN 1 THEN 'sent'
   ELSE 'unknown'
END as Type,

Now the ‘Type’ will be ‘sent’ ‘received’ or ‘unknown’

Next we have to deal with those ugly dates! What the heck is 535834606445705984? Well, it’s the number of milliseconds since 1 Jan 2001 – not real helpful, is it?

For the ‘date’ we want the local date and time the message was sent. The approach is to extract the first 9 characters of the string, add a format string and convert that to a datetime data type. Last, adjust the output to local time. (Add reference here)

datetime(datetime(SUBSTR(date, 1, 9) +strftime('%s', '2001-01-01 00:00:00'), 'unixepoch', 'localtime'))

Now that ugly 535834606445705984  is displayed as 2017-12-24 15:56:46.

We do the same for the Epoch except we want the date without the time, so we change to first datetime to date.

The only other changes needed are to add some error checking for blank values. The final SQL sqring looks like this in SQLite Studio:

SELECT
   m.rowid as RowID,
   h.id AS UniqueID,
   CASE is_from_me
      WHEN 0 THEN 'received'
      WHEN 1 THEN 'sent'
      ELSE 'Unknown'
   END as Type,
   CASE WHEN date > 0
      THEN datetime(datetime(SUBSTR(date, 1, 9) +strftime('%s', '2001-01-01 00:00:00'), 'unixepoch', 'localtime'))
      ELSE NULL
   END as Date,
   CASE WHEN date > 0
      THEN date(datetime(SUBSTR(date, 1, 9) +strftime('%s', '2001-01-01 00:00:00'), 'unixepoch', 'localtime'))
      ELSE NULL
   END as Epoch,
   CASE WHEN text IS NULL
      THEN ''
      ELSE text
   END AS Text,
   CASE WHEN cache_roomnames IS NULL
      THEN 'S'
      ELSE cache_roomnames
   END AS Room,
   maj.attachment_id AS AttachmentID,
   expire_state as expState,
   is_audio_message AS isAudio,
   is_corrupt

FROM message m
   LEFT JOIN handle h ON h.rowid = m.handle_id
   LEFT JOIN message_attachment_join maj ON maj.message_id = m.rowid
ORDER BY RowID

Wrtten in the Python file the string variable becomes:

strSQL = "SELECT m.rowid as RowID, h.id AS UniqueID, CASE is_from_me WHEN 0 THEN 'received' WHEN 1 THEN 'sent' ELSE 'Unknown' END as Type, CASE WHEN date > 0 THEN datetime(datetime(SUBSTR(date, 1, 9) +strftime('%s', '2001-01-01 00:00:00'), 'unixepoch', 'localtime')) ELSE NULL END as Date, CASE WHEN date > 0 THEN date(datetime(SUBSTR(date, 1, 9) +strftime('%s', '2001-01-01 00:00:00'), 'unixepoch', 'localtime')) ELSE NULL END as Epoch, CASE WHEN text IS NULL THEN '' ELSE text END AS Text, CASE WHEN cache_roomnames IS NULL THEN 'S' ELSE cache_roomnames END AS Room, maj.attachment_id AS AttachmentID, expire_state as expState, is_audio_message AS isAudio, is_corrupt FROM message m LEFT JOIN handle h ON h.rowid = m.handle_id LEFT JOIN message_attachment_join maj ON maj.message_id = m.rowid ORDER BY RowID"

We’re also going to need to know where the database is. In SQLite Studio we added and connected to the database suing the built-in UI. In Python we use the sqlite3 module and create the connections in code. We already have a variable pointing to the backup root (curBackupRoot), so we create a new areiable and add the sms.db hash –

dbFilename = curBackupRoot + “/3d/3d0d7e5fb2ce288813306e4d4636395e047a3d28”

As long as you have your curBackupRoot correct, this will work for any configuration. Your backup root will be unique, but the sms database hash is always the same. Now we can start pulling data from the sms database.

SQLite in Python

For those new to programming with databases, I’ll cover SQLite programming at a high level. There are plenty of tutorials on the internet that you can loacte for more information.

We want to execute the SQL query we build and tested in SQLite Studio and then work with the rults, one row at a time. To do that requires onaly a fel lines of code in Python. For error checking, I always ensure I have the correct file path to the database first. Then create a connection and a cursor to the database and finally execute the query.

if (os.path.exists( dbFilename)): # check that the SMS database exists
   cn = sqlite3.connect(dbFilename) # create SQLite database connection
   cur = cn.cursor()
   cur.execute(strSQL) # execute the SQL query
   rows = cur.fetchall() # caputer all rows in a cursor
   for row in rows: # then iterate through each row

The variable ‘rows‘ now contains the message data that we can iterate through.

for row in rows:                    # then iterate through each row
   vMsgID = row[0]
   
   if (not row[1] is None):
      vUID = row[1]
   else:
      vUID = ""
   
   vMsgDir = row[2]
   
   if (not row[3] is None):
       vMsgDate = row[3]
   else:
      vMsgDate = ''
   
   if (not row[4] is None):
       vMsgEpoch = row[4]
   else:
      vMsgEpoch = ''
   
   if (not row[5] is None):
      vMsgTxt = row[5]
   else:
      vMsgTxt = ""

   if (not row[6] is None):
      vChatRm = row[6]
   else:
      vChatRm = ""

   if (not row[7] is None):
      vMsgAtt = row[7]
   else:
      vMsgAtt = 0

   vExpState = row[8]
   vIsAudio = row[9]
   vIsCorrupt = row[10]

The way I chose to store the message data for later use is to write the data (stored in the individual variables) to an array (t) and then store it in a dictionary (D_MSG), using the messageID as the key.

# capture data in a list
t = (vUID, vMsgDir, vMsgDate, vMsgEpoch, vMsgTxt, vChatRm, vMsgAtt, vExpState, vIsAudio, vIsCorrupt)
D_MSG[vMsgID] = t # write data to the dictionary D_MSG
# printMsgArr(vMsgID)

Finally, we close the cursor and then close the database connection.

cur.close()
cn.close()

You can uncomment the # printMsgAtt( vMsgID)   for debugging. After an individual messade data set is saved to the dictionary, this function searches for the messageID in the dictionary and returns the message data. That data is then printed to the console to help in debugging and validating that you’re getting the data you expect.

def printMsgArr(A):
   msgID = A
   data = D_MSG[msgID]
   fmt = '%d %-24s %-8s %10s %10s'
   print ( fmt % (msgID, data[0], data[1], data[2], data[3]))
   #RowID UniqueID Type Date Epoch Text Room AttachmentID

The returned data was discussed at a high level in Part 2. Between that and the variable names, you should have a pretty good understanding of what the data represents. Later, in Part 5 we’ll cover the function of the individual elements deeper as we build the HTML UI.

We went through the SQLite queries and then integrating the SQL in Python pretty quickly.  Get comfortable with the code – add some print statements to validate and understand the data you’re getting back.  When you’re ready, check out part 4.

In Part 4 we’ll dig into a different database – exploring user contacts and pulling out the data we need from there.  After that, we’ll build the chat processing that links messages and contacts together.

About Tim Porter

Tim retired after over 30 years in various technology roles. He's worked in application development, infrastructure, database management and network engineering. In his spare time, Tim also also dables in electronics and microcontroller programming.