Monday 26 October 2015

JSON, Python & Trello

JSON is a simple data storage format intended to allow easy data interchange between programs and systems. Data is stored in a text format as objects of "name : value" pairs. And that's about it, other than objects can be nested (a value could be another whole object) and multiple objects can occur one after another in the data set (a list or array of such objects). This makes it slightly database like because you have records (objects) made up of named fields holding data values, with nested records inside other records for any kind of relationship or hierarchy.

The "name : value" storage structure is often termed a Dictionary where a given name is associated with its value - other terms such as Hash Map are used in other programming languages. This makes a JSON data set an almost perfect match for processing in Python which has almost directly corresponding data types of dict for the "name : value" pairs and list for multiple objects of the same type. And Python offers a standard "json" module (library) for importing and exporting such JSON data sets into and out of Python objects (the corresponding methods are called "load" and "dump" for consistency with the naming in the existing "pickle" module). Generally speaking when you load in a JSON data set it is converted to a corresponding Python dict object, that may in turn contain other embedded list and dict objects according to the structure of the JSON data being loaded in.

Trello

A real example of this in use is with Trello, where you can export your Do Lists to a local file as a JSON data set. You could then open this file and read in the JSON data set in Python, and process it one way or another. Note that you can also directly access the live data in a Trello List using a suitable URL - Google the Trello API for more on this.

Trello structures its data as follows in a hierarchy where each thing has a name and an internal, normally hidden identifier:
  • A Board is the top level structure - each Board is separate from each other
  • A Board contains multiple Lists
  • A List contains Cards - normally these are the "actions" you want to do
  • A Card can contain a few subsidiary other data items, such as a Checklist, or Labels (categories)
With the data from one Trello Board now read into a Python dict object it becomes very easy to navigate through this and pull out individual members of a list for further processing. An obvious example is to do a simple reformat of the lists for printing purposes - Trello only lets you print the whole board or an individual card, and a board can run to many, many pages when you have lots of lists and cards (items) in it given the default spacing between items.

Trello does not actually store the data in a strict hierarchy, but instead more in a relational format, where child items will contain the identifier ("id") of the parent item they are part of. Thus the top level JSON data for a Trello Board contains data of:
  • name - name of the board
  • id - identifier of this board, a long hexadecimal number
  • desc - description of the board
  • lists - all the Lists in this Board
  • cards - all the individual Cards in all the Lists
  • checklists - all the Checklists used in different Cards
i.e. most things are stored directly linked off a Board, and not stored nested within each other. This is clearly to allow easy moving of Cards between Lists - only the parent identifier within the Card needs to be updated, not the Lists involved or the Board itself.

The main data sets of lists, cards and checklists come into Python as a list (array) of multiple members, each member being a dict. Each data record for these child data sets contains an "id" field for the parent i.e. "idBoard" in a List, and "idList" in a Card (Checklists are slightly different as the Card does contain a list of the Checklist identifiers).

Example 1 - Printing the names of the lists in a board

We just need to iterate over the "lists" data set of the Board, printing out the "name" of each List. We can also check if each list is open i.e. not closed. Note that I am using Python 3, hence "print" is a true function now.
import json
import sys

NAME = "name"
ID = "id"
DESC = "desc"
CLOSED = "closed"
LISTS = "lists"

json_file = open (sys.argv [0])
trello_board = json.load (json_file)
print ("Board: " + trello_board [NAME])
print ("Lists:")
# trello_board [LISTS] is a Python list, each member being a Python dict for each List
# Loop through all the Lists i.e. one List at a time
for trello_list in trello_board [LISTS] :
    if (not trello_list [CLOSED]) :
        print (trello_list [NAME])
This assumes that the JSON file name is given as the first argument on the command line to the Python program.

Example 2 - Printing out the cards in one list

Assuming that the second command line argument is the first part of the name of a List in the Board, then we can use this to find that List and then print out just its Cards.
import json
import sys

NAME = "name"
ID = "id"
DESC = "desc"
CLOSED = "closed"
LISTS = "lists"
CARDS = "cards"
CHECKLISTS = "checklists"
IDBOARD = "idBoard"
IDLIST = "idList"
IDCHECKLISTS = "idChecklists"

json_file = open (sys.argv [0])
trello_board = json.load (json_file)
print ("Board: " + trello_board [NAME])

# Loop through all Lists in the Board, comparing its name against the input name
for trello_list in trello_board [LISTS] :
    # Only do a partial match on name on leading part of name
    if (trello_list [NAME] [:len(sys.argv [1])] == sys.argv [1] and not trello_list [CLOSED]) :
        print ("List: " + trello_list [NAME])
        # Loop through all Cards in all Lists, checking the parent ID of the Card
        for trello_card in trello_board [CARDS] :
            if (trello_card [IDLIST] == trello_list [ID] and not trello_card [CLOSED]) :
                print (trello_card [NAME])
                if (trello_card [DESC]) :
                    print (trello_card [DESC])

Note that in reality you would have extra code to check that the command line arguments were present, and better error handling for things like the file not existing.

Also I actually used "textwrap" in Python to word wrap long lines properly in the output, and indent wrapped lines for better readability - I've just used "print" directly in these examples to keep it simple.

Summary

That's it for JSON and Python using Trello as an example. You can load in a JSON data set from a file into a corresponding Python data structure using just one method call (json.method), and it is very easy to traverse that data structure finding and processing the data elements you want to using the "field name" as the index into the dictionary of "name : value" pairs in the Python data structure.

No comments: