SQL Datamodel For J-Pilot

· klm's blog


Original post is here: eklausmeier.goip.de

Currently J-Pilot stores its data in binary form which is compatible with the original Palm data format. See Palm File Format Specification. Reading these binary formats is not simple, see for example pdbrd.c. Portion of the internal structure looks something like this:

 1typedef struct {        // header of record list
 2        LocalID nextRecordListID;
 3        UInt16 numRecords;
 4        UInt16 firstEntry;
 5} RecordListType;
 6
 7typedef struct {        // single element of record list
 8        LocalID localChunkID;   // offset, start of actual data
 9        UInt8 attributes;       // 1st nibble: ?/private, 2nd: this is the category
10        UInt8 uniqueID[3];
11} RecordEntryType;
12
13typedef struct {
14        UInt16 renamedCategories;
15        char categoryLabels[16][16];
16        UInt8 categoryUniqIDs[16];
17        UInt16 lastUniqID_pad;  //UInt8 lastUniqID; UInt8 padding;
18} AppInfoType;

In 2013 I proposed to use SQLite as internal data format instead, see Possible Enhancements to J-Pilot.

Below datamodel closely follows the field declaration in J-Pilot and Pilot-Link, e.g., /usr/include/pi-datebook.h or utils.h.

[more_WP_Tag]

  1create table AddrLabel (
  2	Id            int primary key,
  3	Label         text
  4);
  5
  6-- Labels for address categories, like 'Business', 'Travel', etc.
  7create table AddrCategory (
  8	Id            int primary key,
  9	Label         text
 10);
 11
 12-- Labels for address phone entries, like 'Work', 'Mobile', etc.
 13create table PhoneLabel (
 14	Id            int primary key,
 15	Label         text
 16);
 17
 18-- Actual address information
 19create table Addr (
 20	Id            int primary key, -- unique_ID
 21	Category      int default(0),
 22	Private       int default(0),  -- boolean, zero or one
 23	showPhone     int default(1),  -- which of phone1...5 to show as default
 24	Lastname      text,
 25	Firstname     text,
 26	Title         text,
 27	Company       text,
 28...
 29	Note          text,
 30	foreign key (Category) references AddrCategory(Id),
 31	foreign key (PhoneLabel1) references PhoneLabel(Id),
 32	foreign key (PhoneLabel2) references PhoneLabel(Id),
 33	foreign key (PhoneLabel3) references PhoneLabel(Id),
 34	foreign key (PhoneLabel4) references PhoneLabel(Id),
 35	foreign key (PhoneLabel5) references PhoneLabel(Id)
 36);
 37
 38create table Datebook (
 39	Id            int primary key,
 40	Private       int default(0),  -- boolean, zero or one
 41	Timeless      int default(0),  -- boolean, zero or one
 42	Begin         text,            -- begin date in format YYYY-MM-DDTHH:MM
 43	End           text,            -- end date in format YYYY-MM-DDTHH:MM
 44	Alarm         int,             -- boolean, zero or one
 45...
 46	Description   text,
 47	Note          text
 48);
 49
 50-- Labels for ToDo categories, like 'Business', 'Personal', etc.
 51create table ToDoCategory (
 52	Id            int primary key,
 53	Label         text
 54);
 55
 56create table ToDo (
 57	Id            int primary key,
 58	Category      int default(0),
 59	Private       int default(0),  -- boolean, zero or one
 60	Indefinite    int default(0),  -- boolean, zero or one
 61	Due           text,            -- due date in format YYYY-MM-DD
 62	Priority      int default(1),
 63	Complete      int,             -- boolean, zero or one
 64	Description   text,
 65	Note          text,
 66	foreign key (Category) references ToDoCategory(Id)
 67);
 68
 69-- Labels for memo categories, like 'Business', 'Personal', etc.
 70create table MemoCategory (
 71	Id            int primary key,
 72	Label         text
 73);
 74
 75create table Memo (
 76	Id            int primary key,
 77	Category      int default(0),
 78	Private       int default(0),  -- boolean, zero or one
 79	Text          text,
 80	foreign key (Category) references MemoCategory(Id)
 81);
 82
 83-- Labels for expense categories, like 'Project A', 'Internal', etc.
 84create table ExpenseCategory (
 85	Id            int primary key,
 86	Label         text
 87);
 88
 89-- Labels for expense types, like 'airfaire', 'car rental', etc.
 90create table ExpenseType (
 91	Id            int primary key,
 92	Label         text
 93);
 94-- Taken from /usr/include/pi-expense.h
 95insert into ExpenseType (Id,Label) values (0,'Airfare');
 96insert into ExpenseType (Id,Label) values (1,'Breakfast');
 97...
 98insert into ExpenseType (Id,Label) values (27,'Train');
 99
100-- Labels for expense payments, like 'cash', 'Visa', etc.
101create table ExpensePayment (
102	Id            int primary key,
103	Label         text
104);
105-- Taken from /usr/include/pi-expense.h
106insert into ExpensePayment (Id,Label) values (1,'AmEx');
107insert into ExpensePayment (Id,Label) values (2,'Cash');
108...
109insert into ExpensePayment (Id,Label) values (7,'Visa');
110insert into ExpensePayment (Id,Label) values (8,'Unfiled');
111
112-- Labels for expense currency, like 'US', 'Germany', etc.
113create table ExpenseCurrency (
114	Id            int primary key,
115	Label         text
116);
117-- Taken from Expense/expense.c
118insert into ExpenseCurrency (Id,Label) values (0,'Australia');
119insert into ExpenseCurrency (Id,Label) values (1,'Austria');
120insert into ExpenseCurrency (Id,Label) values (2,'Belgium');
121...
122insert into ExpenseCurrency (Id,Label) values (23,'United States');
123
124create table Expense (
125	Id            int primary key,
126	Category      int default(0),
127	Date          text,            -- date in format YYYY-MM-DD
128	Type          int,             -- 0=airfare, 1=breakfast, etc.
129	Payment       int,             -- 0=AmEx, 1=Cash, etc.
130	Currency      int,
131	Amount        text,
132	Vendor        text,
133	City          text,
134	Attendees     text,
135	Note          text,
136	foreign key (Category) references ExpenseCategory(Id),
137	foreign key (Type) references ExpenseType(Id),
138	foreign key (Payment) references ExpensePayment(Id),
139	foreign key (Currency) references ExpenseCurrency(Id)
140);

It has three advantages A1, A2, and A3 to have the format in an SQL schema. Although, there is one disadvantage D1.

A1. It is very easy to have more than 16 categories. Currently the binary format limits the number of categories to 16 as this number is stored in one nibble. It is also very easy to add more fields, regardless whether they are shown on screen or not.

A2. Having the data format stored in SQLite makes it easy to search in ones own data. For example, searching for specific datebook entries:

1sqlite3 $HOME/.jpilot/jptables.db "select Begin, Description from Datebook where Description like '%$1%' order by Begin"

So we can use the command line to query our J-Pilot data. As illustration, using above query with string "jpsqlite" would result in:

12020-04-17T10:00|jpsqlite.c: Lege SQLite3 Tabellen mittels jptables.sql an
22020-04-17T14:00|jpsqlite.c: Schreibe in plugin_pre_sync_pre_connect() in die beiden Tabellen AddrCategory + Addr
32020-04-18T10:00|jpsqlite.c: Schreibe in Tabelle PhoneLabel, füge phoneLabel[0..7] in Addr hinzu

Similarly, searching in J-Pilot Memos from the command line:

1sqlite3 $HOME/.jpilot/jptables.db "select Text from Memo where Text like '%$1%'" | grep -iC3 $1

Using something similar with the original J-Pilot binary format would involve using strings and grep command.

We are not tied to the command line. Equally well we can use a graphical visualization of our database, for example, using DB Browser for SQLite. An Arch Linux package is here. A screenshot with above schema is above.

CSV export is also easy, as this is a standard export format in SQLite (.mode csv). Also cleansing data, where there is a common error in multiple data, is easy. This SQLite internal format can then be the base for syncing multiple devices, for example a PC, a smartphone and a tablet. This can be accomplished as described below.

D1. In contrast to the internal binary format, the SQLite format is expected to consume more storage. The binary format keeps data packed together. In our case we observed that SQLite needs roughly two times the storage: The internal format used roughly 3 MB, SQLite used roughly 6 MB.

A3. Once the internal data structure in J-Pilot uses SQLite, it becomes easier to implement a "multi-sync" feature, i.e., hot-syncing data with multiple devices. In particular, synchronisation between two devices can now be done in an SQL transaction:

1BEGIN TRANSACTION
2    ...    -- sync data
3END TRANSACTION

For those unfamiliar with the wording in Palm Pilot speech: HotSyncing is the process, where the data from your PC is synchronized with the data on your PalmPilot. It works both ways, i.e., changes on the PC are transfered to the Palm Pilot, and vice versa, changes on the Palm Pilot are copied back to the PC. There is also a form of collision handling, if both, PC and Palm Pilot, have changed the same record.

1. Assumptions. Assume we have three devices, i.e., PC P, smartphone S1, smartphone S2, e.g., S2 could be a tablet. Only P serves as HotSync server, i.e., S1 syncs to P, and S2 syncs to P. But S1 does not sync to S2.

2. Sync table. Data structure on each device, i.e., what table data needs to be stored on each device. For each change (insert, update, delete) in any entity (Datebook, Addr, etc.) we need to store information in another Sync table:

  1. t: Timestamp of change
  2. e: Entity, i.e., address, datebook, memo, etc.
  3. Id: Id of entity table
  4. C: Change-Type (insert, update, delete)
  5. B: Bitfield, where each participating device is crossed out, P=0x01, S1=0x02, S2=0x04
  6. D: concatenated entry of previous record. For example: for a deleted record the record to be deleted itself will be stored. For an updated record, the record prior to its update is stored.

3. Collision Cases. Discuss the various scenarios where data in the devices differ and what to do in each case.

  1. P has new data (C=insert), which S1 does not have, or vice versa: Copy from P to S1, or vice versa.
  2. P has deleted a record that S1 still has untouched, or vice versa: Delete on S1 as well, or delete on P.
  3. P has modified data (C=update), which S1 has also modified (C=update): Use timestamp t to decide which side overrules the other. If S1 is the winner, then store the change in Sync table on P as well, so further hot-syncs with S2 can pick up the changes.
  4. P has new data (C=insert) with Id=x, and S1 has also new data with identical Id=x: Keep x on S1, and update Id=x to Id=y on P, i.e., change Id in underlying entity table and Sync table. Usually y=x+1. Rest of processing proceeds as in first case.

If bitfield B contains all participating devices, i.e., B=0x07, then the record in Sync table can be deleted as it is no longer required.

4. Id-Management. The Id column in above tables is just a consecutive number which is taken from the underlying entity table, for example it is Addr.Id. This is similar to the current Id-Management in J-Pilot. For this whole synchronization to work, P, S1, and S2 must be initialized with data having common Id's. They do not necessarily have to be numeric, as in our case. But having them numeric is probably the simplest option.

The export formats for Datebook, Address, and To-Do's allow and already use UID. J-Pilot encapsulates the Id into this UID field. Memo and Expense do not have these IETF RFC approved formats. For example, address export as vcard contains the Id in hexadecimal form using J-Pilot export.

 1BEGIN:VCARD
 2VERSION:3.0
 3PRODID:-//Judd Montgomery//NONSGML J-Pilot 1.8.2//EN
 4UID:palm-addressbook-00b16e86-00000000-@chieftec
 5CATEGORIES:QuickList
 6FN:Palm III Accessories
 7N:Palm III Accessories;
 8ORG:3Com Corporation
 9TEL;TYPE=work,pref:801-431-1536
10TEL;TYPE=x-other:www.palm.com
11END:VCARD

5. Optimizations. If data in Sync table contains a newly created record (C=insert), which is then deleted (C=delete) and bitfield B in both cases is equal to the device at hand, then these two records can be deleted from the Sync table. For example, if on P the value of B=P, then these two records including all records in between referring to the same Id, can be deleted in Sync table.

As from above description it follows that D is not strictly required.

6. Network implementation. Syncing different devices can be done as follows: P spawns a simple HTTP server, capable of handling one connection. Then S1 connects using HTTP GET and PUT, thereby sending contents of its own HotSync table and receiving data from the remote HotSync table. P and S1 then each perform above operations to bring their data in sync. S2 does likewise after S1 has finished its hotsync.

7. ER diagram. The above tables can be visualized in below ER diagram.

[mermaid] erDiagram AddrLabel { int Id string Label } AddrCategory { int Id string Label } PhoneLabel { int Id string Label } AddrCategory ||--o{ Addr : "FK: Category" PhoneLabel ||--o{ Addr : "FK: PhoneLabel1-5" Addr { int Id int Category int Private int showPhone string Lastname string Firstname string Title string Company int PhoneLabel1-5 string Phone1-5 string Address string City string State string Zip string Country string Custom1-4 string Note } [/mermaid]

[mermaid] erDiagram Datebook { int Id int Private int Timeless string Begin string End int Alarm int Advance int AdvanceUnit int RepeatType int RepeatForever string RepeatEnd int RepeatFreq int RepeatDay int RepeatDaySu int RepeatDayMo int RepeatDayTu int RepeatDayWe int RepeatDayTh int RepeatDayFr int RepeatDaySa int Exceptions string Exception string Description string Note } ToDoCategory { int Id string Label } ToDoCategory ||--o{ ToDo : "FK: Category" ToDo { int Id int Category int Private int Indefinite string Due int Priority int Complete string Description string Note } MemoCategory { int Id string Label } MemoCategory ||--o{ Memo : "FK: Category" Memo { int Id int Category int Private string Text } [/mermaid]

[mermaid] erDiagram ExpenseCategory { int Id string Label } ExpenseType { int Id string Label } ExpensePayment { int Id string Label } ExpenseCurrency { int Id string Label } ExpenseCategory ||--o{ Expense : "FK: Category" ExpenseType ||--o{ Expense : "FK: Type" ExpensePayment ||--o{ Expense : "FK: Payment" ExpenseCurrency ||--o{ Expense : "FK: Currency" Expense { int Id int Category string Date int Type int Payment int Currency string Amount string Vendor string City string Attendees string Note } [/mermaid]

[mermaid] erDiagram Pref { int Id string Name string UserType string FileType int iValue string sValue } Alarms { string UpToDate } [/mermaid]

Some tables just contain constants.

PhoneLabel(*)
0Work
1Home
2Fax
3Other
4E-mail
5Main
6Pager
7Mobile
AddrLabel(*)
0Last name
1First name
2Company
3Work
4Home
5Fax
6Other
7E-mail
8Addr(W)
9City
10State
11Zip Code
12Country
13Title
14UserId
15Custom 2
16Birthday
17Custom 4
18Note
ExpenseCurrency
0Australia
1Austria
2Belgium
3Brazil
4Canada
5Denmark
133EU (Euro)
6Finland
7France
8Germany
9Hong Kong
......
20Sweden
21Switzerland
22United Kingdom
23United States
30Singapore
31Thailand
32Taiwan
ExpensePayment
1AmEx
2Cash
3Check
4Credit Card
5MasterCard
6Prepaid
7Visa
8Unfiled
ExpenseType
0Airfare
1Breakfast
2Bus
3Business Meals
4Car Rental
5Dinner
6Entertainment
7Fax
8Gas
9Gifts
10Hotel
11Incidentals
12Laundry
......
22Supplies
23Taxi
24Telephone
25Tips
26Tolls
27Trains

The two database tables marked with (*) are not initially loaded with INSERT statements as the other database tables in above table. Rather, their values are loaded from the header information.

Added 28-Nov-2022: Added InsertDate and UpdateDate to all relevant tables. Also added Pref and Alarms tables.