Drafting a POC JSONField for SQLite

There’s this neat little python package called django-jsonfallback. It’s made by one of my mentors. To quote the description, this package allows you to

Use PostgreSQL’s and MySQL’s JSONField, but fall back to a TextField implementation on other databases.

Since my project is to bring a cross-DB JSONField, this project is very useful for me as a starting point. To get a better understanding of how that package works (and how Django model fields work in general), I did a good read on these wonderful Django docs about custom model fields and custom lookups. I also found this interesting Custom Database Backends presentation from Django Under the Hood 2016 event.

As explained in the docs and presentation, you can create as_vendor methods such as as_mysql, as_sqlite, etc. to create different implementations of lookups and transforms (it also applies for any class that extends BaseExpression, if I’m not mistaken). I began looking at the code in django-jsonfallback, and I found that it was made with some if conditionals to check for the database backend. I tried replacing the as_sql methods with as_mysql as you can see in this pull request, and it worked pretty well.

I also tried to replace the conditionals used in process_lhs and process_rhs methods, and it worked for the CaseInsensitiveMixin. However, when I tried to do the same for StringKeyTransformTextLookupMixin, the tests failed. I tried to print the compiled query by adding:

print(Book.objects.filter(data__title__iexact='harry potter').query)

and it shows:

SELECT `testapp_book`.`id`, `testapp_book`.`data`
FROM `testapp_book`
WHERE LOWER(JSON_EXTRACT(`testapp_book`.`data`, $.title))
LIKE LOWER(harry potter)

If I print the query before I replaced the process_rhs method, it shows:

SELECT `testapp_book`.`id`, `testapp_book`.`data`
FROM `testapp_book`
WHERE LOWER(JSON_EXTRACT(`testapp_book`.`data`, $.title))
LIKE LOWER("harry potter")

As you can see, the quotes around harry potter went missing. I don’t know why. I have spent hours trying to debug this but I still haven’t found a clue.

Anyway, I decided to stop refactoring and tried to do something new: implementing the lookups for SQLite.

The JSON1 extension provides some functions to handle JSON data in SQLite. These functions are quite simple and somewhat limited compared to the functions in other database backends such as PostgreSQL or MySQL. However, if you can come up with some neat tricks, you can create some useful lookups with these functions.

The existing PostgreSQL JSONField already has some lookups and transforms implemented. I made a simple summary of them in this Google Sheets document. Looking at the JSON1 functions, I found json_extract() or json_type() might be useful in implementing has_key, has_any_keys, and has_keys lookups.

How would that work?

The json_extract(X,P1,P2,...) function extracts and returns one or more values from the well-formed JSON string X that are selected by the paths P1, P2, …. If the selected path doesn’t exist, it will return NULL. That means, if we would like to do a has_key lookup such as:

Book.objects.filter(data__has_key='title')

We could imagine it being an SQL query like this:

SELECT * FROM myapp_book
WHERE json_extract(myapp_book.data, '$.title') IS NOT NULL

That would work, right?

Not quite. The json_extract function also returns NULL if the path exists and the value is actually JSON null. So if the JSON object has the specified key but has null as the value, our query would not return such rows. This isn’t what we want!

Let’s look for an alternative: json_type()

To quote the JSON1 documentation,

The json_type(X,P) function returns the “type” of the element in X that is selected by path P. The “type” returned by json_type() is one of the following an SQL text values: 'null', 'true', 'false', 'integer' 'real', 'text', 'array', or 'object'.

As you can see, it returns SQL text values of the JSON type of the value. Like json_extract(), it also returns NULL if the path doesn’t exist. However, for null values, it returns 'null'. See the difference?

Therefore, we can still use the same SQL formula, replacing json_extract with json_type, like so:

SELECT * FROM myapp_book
WHERE json_type(myapp_book.data, '$.title') IS NOT NULL

Pretty clever, don’t you think? We just used a function that checks the datatype of a JSON value to implement has_key lookup!

Now, you might already see it coming, but implementing has_any_keys and has_keys lookups are very easy now that we already know what function to use.

For has_any_keys, we just need to chain multiple json_type() IS NOT NULL expressions with OR for each key specified in the lookup. To illustrate, the following lookup:

Book.objects.filter(data__has_any_keys=['title', 'foo', 'bar'])

would translate into the following SQL query:

SELECT * FROM myapp_book
WHERE (
    json_type(myapp_book.data, '$.title') IS NOT NULL OR
    json_type(myapp_book.data, '$.foo') IS NOT NULL OR
    json_type(myapp_book.data, '$.bar') IS NOT NULL
)

And for has_keys, it’s pretty much the same. We just need to chain it with AND instead of OR.

I’ve made some commits implementing those lookups on my sqlite-lookups branch. I also deleted @xfail decorators for those lookups’ tests. So now, running the tests with SQLite should expect those lookups to work correctly.

Now, I haven’t figured out the best way to implement the remaining lookups (and transforms). Also, I’m not really sure if what I did is the best approach to implement those three lookups. If you have any thoughts about this, please let me know on the comments below!