LiteDB: Category Archive

Posts about LiteDB, an in-process .NET document database

Tuesday, December 7, 2021
  A Tour of myPrayerJournal v3: Conclusion

NOTE: This is the final post in a series; see the introduction for information on requirements and links to other posts in the series.

We've gone in depth on several different aspects of this application and the technologies it uses. Now, let's zoom out and look at some big-picture lessons learned.

What I Liked

Generally speaking, I liked everything. That does not make for a very informative post, though, so here are a few things that worked really well.

Simplification Via htmx

One of the key concepts in a Representational State Transfer (REST) API is that of Hypermedia as the Engine of Application State (HATEOAS). In short, this means that the state of an application is held within the hypermedia that is exchanged between client and server; and, in practice, the server is responsible for altering that state. This is completely different from the JSON API / JavaScript framework model, even if they use GET, POST, PUT, and PATCH properly.

(This is a near over-simplification; the paper that initially proposed these concepts – in much, much more detail – earned its author a doctoral degree.)

The simplicity of this model is great; and, when I say “simplicity,” I am speaking of a lack of complexity, not a naïveté of approach. I was able to remove a large amount of complexity and synchronization from the client/server interactions between myPrayerJournal v2 and v3. State management used to be the most complex part of the application. Now, the most complex part is the HTML rendering; since that is what controls the state, though, this makes sense. I have 25 years of experience writing HTML, and even at its most complex, it simply is not.

LiteDB

This was a very simple application - and, despite its being open for any user with a Google or Microsoft account, I have been the only regular user of the application. LiteDB's setup was easy, implementation was easy, and it performs really well. I suspect this would be the case with many concurrent users. If the application were to grow, and I find that my suspicion was not borne out by reality, I could create a database file per user, and back up the data directory instead of a specific file. As with htmx, the lack of complexity makes the application easily maintainable.

What I Learned

Throughout this entire series of posts, most of the content would fall under this heading. There are a few things that did not fit into those posts, though.

htmx Support in .NET

I developed Giraffe.Htmx as a part of this effort, and mentioned that I became aware of htmx on an episode of .NET Rocks!. The project I developed is very F#-centric, and uses features of the language that are not exposed in C# or VB.NET. However, there are two packages that work with the standard ASP.NET Core paradigm. Htmx provides server-side support for the htmx request and response headers, similar to Giraffe.Htmx, and Htmx.TagHelpers contains tag helpers for use in Razor, similar to what Giraffe.ViewEngine.Htmx does for Giraffe View Engine. Both are written by Khalid Abuhakmeh, a developer advocate at JetBrains (which generously licensed their tools to this project, and produces the best developer font ever).

While I did not use these projects, I did look at the source, and they look good. Open source libraries go from good to great by people using them, then providing constructive feedback (and pull requests, if you are able).

Write about Your Code

Yes, I'm cheating a bit with this one, as it was one of the takeaways from the v1 tour, but it's still true. Writing about your code has several benefits:

  • You understand your code more fully.
  • Others can see not just the code you wrote, but understand the thought process behind it.
  • Readers can provide you feedback. (This may not always seem helpful; regardless of its tone, though, thinking through whether the point of their critique is justified can help you learn.)

And, really, knowledge sharing is what makes the open-source ecosystem work. Closed / proprietary projects have their place, but if you do something interesting, write about it!

What Could Be Better

Dove-tailing from the previous section, writing can also help you think through your code; if you try to explain it, and and have trouble, that should serve as a warning that there are improvements to be had. These are the areas where this project has room to get better.

Deferred Features

There were 2 changes I had originally planned for myPrayerJournal v3 that did not get accomplished. One is a new “pray through the requests” view, with a distraction-free next-card-up presentation. The other is that updating requests sends them to the bottom of the list, even if they have not been marked as prayed; this will require calculating a separate “last prayed” date instead of using the “as of” date from the latest history entry.

The migration introduced a third deferred change. When v1/v2 ran in the browser, the dates and times were displayed in the user's local timezone. With the HTML being generated on the server, though, dates and times are now displayed in UTC. The purpose of the application is to focus the user's attention on their prayer requests, not to make them have to do timezone math in their head! htmx has an hx-headers attribute that specifies headers to pass along with the request; I plan to use a JavaScript call to set a header on the body tag when a full page loads (hx-headers is inherited), then use that timezone to adjust it back to the user's current timezone.

That LiteDB Mapping

I did a good bit of tap-dancing in the LiteDB data model and mapping descriptions, mildly defending the design decisions I had made there. The recurrence should be designed differently, and there should be individual type mappings rather than mapping the entire document. Yes, it worked for my purpose, and this project was more about Vue to htmx than ensuring a complete F#-to-LiteDB mapping of domain types. As I implement the features above, though, I believe I will end up fixing those issues as well.


Thank you for joining me on this tour; I hope it has been enjoyable, and maybe even educational.

Categorized under , , , , ,
Tagged , , , , , , , , , , , , , , , , , ,

Wednesday, December 1, 2021
  A Tour of myPrayerJournal v3: The Data Store

NOTE: This is the fourth post in a series; see the introduction for information on requirements and links to other posts in the series.

myPrayerJournal v1 used PostgreSQL with Entity Framework Core for its backing store (which had a stop on the v1 tour). v2 used RavenDB, and while I didn't write a tour of it, you can see the data access logic if you'd like. Let's take a look at the technology we used for v3.

About LiteDB

LiteDB is a single-file, in-process database, similar to SQLite. It uses a document model for its data store, storing Plain Old CLR Objects (POCOs) as Binary JSON (BSON) documents in its file. It supports cross-collection references, customizable mappings, different access modes, and transactions. It allows documents to be queried via LINQ syntax or via its own SQL-like language.

As I mentioned in the introduction, I picked it up for another project, and really enjoyed the experience. Its configuration could not be easier – the connection string is literally a path and file name – and it had good performance as well. The way it locks its database file, I can copy it while the application is up, which is great for backups. It was definitely a good choice for this project.

The Domain Model

When I converted from PostgreSQL to RavenDB, the data structure ended up with one document per request; the history log and notes were stored as F# lists (arrays in JSON) within that single document. RavenDB supports indexes which can hold calculated values, so I had made an index that had the latest request text, and the latest time an action was taken on a request. When v2 displayed any list of requests, I queried the index, and got the calculated fields for free.

The model for v3 is very similar.

/// Request is the identifying record for a prayer request
[<CLIMutable; NoComparison; NoEquality>]
type Request = {
  /// The ID of the request
  id           : RequestId
  /// The time this request was initially entered
  enteredOn    : Instant
  /// The ID of the user to whom this request belongs ("sub" from the JWT)
  userId       : UserId
  /// The time at which this request should reappear in the user's journal by manual user choice
  snoozedUntil : Instant
  /// The time at which this request should reappear in the user's journal by recurrence
  showAfter    : Instant
  /// The type of recurrence for this request
  recurType    : Recurrence
  /// How many of the recurrence intervals should occur between appearances in the journal
  recurCount   : int16
  /// The history entries for this request
  history      : History list
  /// The notes for this request
  notes        : Note list
  }

A few notes would probably be good here:

  • The CLIMutable attribute allows this non-nullable record type to be null, and generates a zero-argument constructor that reflection-based processes can use to create an instance. Both of these are needed to interface with a C#-oriented data layer.
  • By default, F# creates comparison and equality implementations for record types. This type, though, is a simple data transfer object, so the NoEquality and NoComparison attributes prevent these from being generated.
  • Though not shown here, History has an “as-of” date/time, an action that was taken, and an optional request text field; Note has the same thing, minus the action but requiring the text field.

Customizing the POCO Mapping

If you look at the fields in the Request type above, you'll spot exactly one primitive data type (int16). Instant comes from NodaTime, but the remainder are custom types. These are POCOs, but not your typical POCOs; by tweaking the mappings, we can get a much more efficient BSON representation.

Discriminated Unions

F# supports discriminated unions (DUs), which can be used in different ways to construct a domain model in such a way that an invalid state cannot be represented (TL;DR - “make invalid states unrepresentable”). One way of doing this is via the single-case DU:

/// The identifier of a user (the "sub" part of the JWT)
type UserId =
  | UserId of string

Requests are associated with the user, via the sub field in the JWT received from Auth0. That field is a string; but, in the handler that retrieves this from the Authorization header, it is returned as UserId [sub-value]. In this way, that string cannot be confused with any other string (such as a note, or a prayer request).

Another way DUs can be used is to generate enum-like types, where each item is its own type:

/// How frequently a request should reappear after it is marked "Prayed"
type Recurrence =
  | Immediate
  | Hours
  | Days
  | Weeks

Here, these four values will refer to a recurrence, and it will take no others. This barely scratches the surface on DUs, but it should give you enough familiarity with them so that the rest of this makes sense.

For the F#-fluent - you may be asking “Why didn't he define this with Hours of int16, Days of int16, etc. instead of putting the number in Request separate from the type?” The answer is a combination of evolution – this is the way it worked in v1 – and convenience. I very well could have done it that way, and probably should at some point.

Converting These Types in myPrayerJournal v2

F# does an excellent job of transparently representing DUs, Option types, and others to F# code, while their underlying implementation is a CLR type; however, when they are serialized using traditional reflection-based serializers, the normally-transparent properties appear in the output. RavenDB (and Giraffe, when v1 was developed) uses JSON.NET for its serialization, so it was easy to write a converter for the UserId type:

/// JSON converter for user IDs
type UserIdJsonConverter () =
  inherit JsonConverter<UserId> ()
  override __.WriteJson(writer : JsonWriter, value : UserId, _ : JsonSerializer) =
    (UserId.toString >> writer.WriteValue) value
  override __.ReadJson(reader: JsonReader, _ : Type, _ : UserId, _ : bool, _ : JsonSerializer) =
    (string >> UserId) reader.Value

Without this converter, a property “x”, with a user ID value of “abc”, would be serialized as:

{ "x": { "Case": "UserId", "Value": "abc" } }

With this converter, though, the same structure would be:

{ "x": "abc" }

For a database where you are querying on a value, or a JSON-consuming front end web framework, the latter is definitely what you want.

Converting These Types in myPrayerJournal v3

With all of the above being said – LiteDB does not use JSON.NET; it uses its own custom BsonMapper class. This means that the conversions for these types would need to change. LiteDB does support creating mappings for custom types, though, so this task looked to be a simple conversion task. As I got into it, though, I realized that nearly every field I was using needed some type of conversion. So, rather than create converters for each different type, I created one for the document as a whole.

It was surprisingly straightforward, once I figured out the types! Here are the functions to convert the request type to its BSON equivalent, and back:

/// Map a request to its BSON representation
let requestToBson req : BsonValue =
  let doc = BsonDocument ()
  doc["_id"]          <- RequestId.toString req.id
  doc["enteredOn"]    <- req.enteredOn.ToUnixTimeMilliseconds ()
  doc["userId"]       <- UserId.toString req.userId
  doc["snoozedUntil"] <- req.snoozedUntil.ToUnixTimeMilliseconds ()
  doc["showAfter"]    <- req.showAfter.ToUnixTimeMilliseconds ()
  doc["recurType"]    <- Recurrence.toString req.recurType
  doc["recurCount"]   <- BsonValue req.recurCount
  doc["history"]      <- BsonArray (req.history |> List.map historyToBson |> Seq.ofList)
  doc["notes"]        <- BsonArray (req.notes   |> List.map noteToBson    |> Seq.ofList)
  upcast doc
  
/// Map a BSON document to a request
let requestFromBson (doc : BsonValue) =
  { id           = RequestId.ofString doc["_id"].AsString
    enteredOn    = Instant.FromUnixTimeMilliseconds doc["enteredOn"].AsInt64
    userId       = UserId doc["userId"].AsString
    snoozedUntil = Instant.FromUnixTimeMilliseconds doc["snoozedUntil"].AsInt64
    showAfter    = Instant.FromUnixTimeMilliseconds doc["showAfter"].AsInt64
    recurType    = Recurrence.ofString doc["recurType"].AsString
    recurCount   = int16 doc["recurCount"].AsInt32
    history      = doc["history"].AsArray |> Seq.map historyFromBson |> List.ofSeq
    notes        = doc["notes"].AsArray   |> Seq.map noteFromBson    |> List.ofSeq
    }

Each of these round-trips as the same value; line 6 (doc["userId"]) stores the string representation of the user ID, while line 19 (userId =) creates a strongly-typed UserId from the string stored in database.

The downside to this technique is that LINQ won't work; passing a UserId would look for the default serialized version, not the simplified string version. This is not a show-stopper, though, especially for such a small application as this. If I had wanted to use LINQ for queries, I would have written several type-specific converters instead.

Querying the Data

In v2, there were two different types; Request was what was stored in the database, and JournalRequest was the type that included the calculated fields included in the index. This conversion came into the application; ofRequestFull is a function that performs the calculations, and returns an item which has full history and notes, while ofRequestLite does the same thing without the history and notes lists.

With that knowledge, here is the function that retrieves the user's current journal:

/// Retrieve the user's current journal
let journalByUserId userId (db : LiteDatabase) = backgroundTask {
  let! jrnl = db.requests.Find (Query.EQ ("userId", UserId.toString userId)) |> toListAsync
  return
    jrnl
    |> Seq.map JournalRequest.ofRequestLite
    |> Seq.filter (fun it -> it.lastStatus <> Answered)
    |> Seq.sortBy (fun it -> it.asOf)
    |> List.ofSeq
  }

Line 3 contains the LiteDB query; when it is done, jrnl has the type System.Collections.Generic.List<Request>. This “list” is different than an F# list; it is a concrete, doubly-linked list. F# lists are immutable, recursive item/tail pairs, so F# views the former as a form of sequence (as it extends IEnumerable<T>). Thus, the Seq module calls in the return statement are the appropriate ones to use. They execute lazily, so filters should appear as early as possible; this reduces the number of latter transformations that may need to occur.

Looking at this example, if we were to sort first, the entire sequence would need to be sorted. Then, when we filter out the requests that are answered, we would remove items from that sequence. With sorting last, we only have to address the full sequence once, and we are sorting a (theoretically) smaller number of items. Conversely, we do have to run the map on the original sequence, as lastStatus is one of the calculated fields in the object created by ofRequestLite. Sometimes you can filter early, sometimes you cannot.

(Is this micro-optimizing? Maybe; but, in my experience, taking a few minutes to think through collection pipeline ordering is a lot easier than trying to figure out why (or where) one starts to bog down. Following good design principles isn't premature optimization, IMO.)

Getting a Database Connection

The example in the previous section has a final parameter of (db: LiteDatabase). As Giraffe sits atop ASP.NET Core, myPrayerJournal uses the traditional dependency injection (DI) container. Here is how it is configured:

/// Configure dependency injection
let services (bldr : WebApplicationBuilder) =
  // ...
  let db = new LiteDatabase (bldr.Configuration.GetConnectionString "db")
  Data.Startup.ensureDb db
  bldr.Services
    // ...
    .AddSingleton<LiteDatabase> db
  |> ignore
  // ...

The connection string comes from appsettings.json. Data.Startup.ensureDb makes sure that requests are indexed by user ID, as that is the parameter by which request lists are queried; this also registers the converter functions discussed above. LiteDB has an option to open the file for shared access or exclusive access; this implementation opens it for exclusive access, so we can register that connection as a singleton. (LiteDB handles concurrent queries itself.)

Getting the database instance out of DI is, again, a standard Giraffe technique:

/// Get the LiteDB database
let db (ctx : HttpContext) = ctx.GetService<LiteDatabase> ()

This can be called in any request handler; here is the handler that displays the journal cards:

// GET /components/journal-items
let journalItems : HttpHandler =
  requiresAuthentication Error.notAuthorized
  >=> fun next ctx -> backgroundTask {
    let  now   = now ctx
    let! jrnl  = Data.journalByUserId (userId ctx) (db ctx)
    let  shown = jrnl |> List.filter (fun it -> now > it.snoozedUntil && now > it.showAfter)
    return! renderComponent [ Views.Journal.journalItems now shown ] next ctx
    }

Making LiteDB Async

I found it curious that LiteDB's data access methods do not have async equivalents (ones that would return Task<T> instead of just T). My supposition is that this is a case of YAGNI. LiteDB maintains a log file, and makes writes to that first; then, when it's not busy, it synchronizes the log to the file it uses for its database. However, I wanted to control when that occurs, and the rest of the request/function pipelines are async, so I set about making async wrappers for the applicable function calls.

Here are the data retrieval functions:

/// Convert a sequence to a list asynchronously (used for LiteDB IO)
let toListAsync<'T> (q : 'T seq) =
  (q.ToList >> Task.FromResult) ()

/// Convert a sequence to a list asynchronously (used for LiteDB IO)
let firstAsync<'T> (q : 'T seq) =
  q.FirstOrDefault () |> Task.FromResult

/// Async wrapper around a request update
let doUpdate (db : LiteDatabase) (req : Request) =
  db.requests.Update req |> ignore
  Task.CompletedTask

And, for the log synchronization, an extension method on LiteDatabase:

/// Extensions on the LiteDatabase class
type LiteDatabase with
  // ...
  /// Async version of the checkpoint command (flushes log)
  member this.saveChanges () =
    this.Checkpoint ()
    Task.CompletedTask

None of these actually make the underlying library use async I/O; however, they do let the application's main thread yield until the I/O is done. Also, despite the saveChanges name, this is not required to save data into LiteDB; it is there once the insert or update is done (or, optionally, when the transaction is committed).

Final Thoughts

As I draft this, this paragraph is on line 280 of this post's source; the entire Data.fs file is 209 lines, including blank lines and comments. The above is a moderately long-winded explanation of what is nicely terse code. If I had used traditional C#-style POCOs, the code would likely have been shorter still. The backup of the LiteDB file is right at half the size of the equivalent RavenDB backup, so the POCO-to-BSON mapping paid off there. I'm quite pleased with the outcome of using LiteDB for this project.

Our final stop on the tour will wrap up with overall lessons learned on the project.

Categorized under , , ,
Tagged , , , , , , , , , , ,

Friday, November 26, 2021
  A Tour of myPrayerJournal v3: Introduction

This is the first of 5 posts in this series.

Background

Around 3 years ago, I wrote an 8-part series called “A Tour of myPrayerJournal”, recounting the decisions and implementation of its initial release. Version 2 did not get its own tour, as it used a similar architecture. There were also some nagging library issues that were never resolved, leading to v2 being an overall unsatisfying step in the evolution of this application.

When Vue v3 was announced, this sounded like a great opportunity, with first-class TypeScript support and a new component syntax that promised better performance and a better developer experience. This past summer, I completed a project with the mature Vue v3 framework, and was generally pleased with the results. Just after I returned to my previously abandoned migration attempt on this project (with early Vue v3 support), I heard about htmx. With a few attributes, and a server that can handle a few HTTP headers, you can build an interactive site, with performance rivaling or exceeding that of the typical Single Page Application (SPA) - or, at this point, so they claimed.

I also picked up LiteDB on another project over the summer, and it worked well. I thought, why not give these technologies a try, and see if I would like the result?

(SPOILER ALERT: I did!)

The Requirements

Requirements for v3 were, for the most part, to update the application to Vue v3. Without rehashing the entire list (see the other intro post), the basic idea is that a prayer request is represented by a card, and this card keeps up with all changes made to it. Also, the system can present the cards that are active, arranged with the oldest action date first, and allow you to tick through the cards. (This is the flow to enable the user to "pray through their list.")

The goal is to remain a minimalist program; the focus should be on prayer, not using a website. To that end, I had envisioned a “one-at-a-time” scenario that would clear out distractions and present the cards in the same order. I had also planned to separate the “last prayed” date from the “last activity” date; currently, updating the text of a request moves it to the bottom of the stack. However, both of these improvements were deferred to v3.1; v3 restores the (adequate) functionality of v1, while being much lighter-weight.

The Tech Stack

This stack did not go through nearly as many iterations as v1.

Giraffe is a library that enables F# developers to create ASP.NET Core endpoints in a functional style. It's a mature library (v1 used Giraffe!), and continues to be improved. It also provides an optional “Giraffe View Engine,” which will get more attention in the user interface post; the views for v3 are produced via this view engine.

htmx is a JavaScript library that asks… well, several questions. Why should links and buttons be the only interactive elements? Why should you have to replace the whole page every time? What would HTML look like if it had been developed the way a typical programming language would be? It uses a small set of attributes to answer these questions differently, making interactive sites possible without writing any JavaScript. (The custom JavaScript file in v3 is 82 lines, including comments - and the majority of that is Bootstrap interaction.)

Since, in the htmx way, the web server returns rendered HTML, the requests can be a bit larger than the equivalent API calls that return JSON for a SPA framework to render. However, this is offset somewhat by the fact that the browser just has to swap that HTML fragment in; the processing is faster and much less complex.

What really swung me over the fence to giving it a shot, though, was a point Carson (the author of the library) made while talking with Carl and Richard on the .NET Rocks! podcast. Having a server render the HTML, and the browser merely displaying it, keeps your application logic on the server; the only JavaScript you need to write is what is required for the user interface. This eliminates a host of synchronization issues with SPAs and their associated APIs - duplicating shapes of data, ensuring calculations are in sync, etc. It also keeps your application logic from needing to be exposed to the public Internet; this doesn't entirely prevent exploits, but the prospective hacker doesn't start with a full copy of your code.

LiteDB could be described as SQLite for documents. Collections of Plain-Old CLR Objects (POCOs) can be stored, retrieved, searched, indexed, and deleted, all while running in the current process, and requiring no separate database server install. While it does not require any special configuration to do this, it does also provide the ability to transform these objects. This gives complete control as to how much or how little transformation you want to specify; and, as we'll see in part 3, this came in handy for this application.

Where We Go from Here

In the next post, we'll take a look at Giraffe, its View Engine, htmx, and how they all work together. The post after that will dive into the aforementioned 82 lines of JavaScript to see how we can control Bootstrap's client/browser behavior from the server. After that, we'll dig in on LiteDB, to include how we serialize some common F# constructs. Finally, we'll wrap up the series with overarching lessons learned, and other thoughts which may not fit nicely into one of the other posts.

Categorized under , , , ,
Tagged , , , , , , , , ,