Why I tend to recommend Postgres to BS ITE students

2023-11-23

BS ITE students often approach me with the question of what database to use for their startup. With a few exceptions, I recommend Postgres. (The exception is Firestore, and that's if they have already started building something with some front-end framework.) If you are from ITE, this is for you.

My preference for Postgres is less about Postgres and more about its data model. Postgres, and other relational database management systems, are robust technologies that offer formidable flexibility when it comes to your application's data.

I understand that, when you start learning how to program, it can be tempting to go for so-called "schema-less" databases like Mongo. The marketing fluff around these databases is written to convince you that they are more flexible in schema than relational databases. I must disagree -- no matter what, you are committing to a constraint on your data. Think about what you trade when you give up the need to write a schema. Yes, you may be free to stick whatever data you want in your database, but you will later be constrained by how you can get our data out.

I find this trade-off unacceptable, especially for an early project that doesn't understand the technological requirements of its problem space.

Modeling "schema-less" data

Let's look at an example of how you might work with a schema-less database.

ITE students tend to gravitate towards Mongo and Firestore. To oversimplify, these are both "document-oriented" databases. You can basically store data in them as if they were a big Python dictionary or JavaScript object. We can use dictionaries/objects as an analog for the document model, then.

If we want to model an e-commerce app, we might start with the User, Cart (and maybe CartItem), Store, and SKU entities. A relational database schema would probably make a table for each of these entities. In a document-oriented database, you would have to do it a little differently.

{
	"user:1": {
		"name": "Joe",
		"cart": {
			"products": [
				{"name": "Happi Chikie", "quantity": 1, "store": "Good Taste"}
			]
		}
	},
	"user:2": {
		"name": "Joben",
		"cart": {
			"products": [
				{"name": "Happi Chikie", "quantity": 2, "store": "Good Taste"},
				{"name": "Americano", "quantity": 1, "store": "Tomo Coffee"},
			]
		}
	}
}

A broad view of this example shows you that some entities are clearly subordinate to others. That in itself is not a problem. In fact, this looks fine. But let's look more closely at how we're supposed to actually use this database.

The keys of a document are your first problem. You must access dictionary values through dictionary keys, which means you must know the key in advance.

Consider this: will a key of this form, "user:1", work? I will assert that no, it won't. When someone logs in, they don't give you their autoincrement ID; they give you their username or their email.

No problem, then, right? We can just use the email:

{
	"user:jrilagan@somedomain.com": {
		"name": "Joe",
		"cart": {
			"products": [
				{"name": "Happi Chikie", "quantity": 1, "store": "Good Taste"}
			]
		}
	},
	"user:jbilagan@somedomain.com": {
		"name": "Joben",
		"cart": {
			"products": [
				{"name": "Happi Chikie", "quantity": 2, "store": "Good Taste"},
				{"name": "Americano", "quantity": 1, "store": "Tomo Coffee"},
			]
		}
	}
}

Look at what you've done now, though. You've tightly coupled a business value (the user's email) to their unique record. You will see this problem everywhere in document-oriented databases: suddenly, business values have much more meaning than they should.

You might not think that this is a big problem, but just you wait until requirements change. What you've really done is reduce what I like to call your optionality when it comes to your data. You can still make updates and changes, sure, but they are now much more expensive for you to do. If you consider updates painful, you will shy away from them. This will slow down your pace of development.

Honestly, the user:{email} problem is straightforward enough. Consider instead what will happen if a store wants to change their name. There is no single place for you to change that: you will have to go into every single cart item and update the store's name there. Now consider how painful this will be if you have more than ten layers of nesting.

Access paths

Your second problem, and the one I consider the more insidious problem, is that you will constrain your access paths to your data.

Let's say that your product people want to introduce a new feature to your app: you can click on a store and see their menu. Simple, right?

If you started with the document above, now you have two options: make a separate document for stores, then duplicate the product data there, or try to scan every cart item for products sold by a specific store when the button for that store is clicked. Again, you might think that this isn't so bad, but as new features keep coming, you will keep having to strain your data model. One day, it will break.

All this to replace what would've been a simple WHERE and JOIN in SQL. Really, had you just modeled your data upfront, you would've been able to support almost any practical query with SQL.

That's what I mean when I say "access path." With a proper table model and a knowledge of SQL, you can answer almost any question about your data. Despite its clunkiness, SQL is a high-level (read: "easy") query language that's proven itself in countless production systems. Maybe you don't believe me now, but you will later.

End

Document data models have their place, but it is mostly in the realm of scale. The independence of data within a document database allows it to be split across many machines, which makes "web-scale" problems solvable. I seriously doubt, though, that you (the ITE student, to be clear) have web-scale problems, but you do have frequently changing requirements.

Making a schema upfront is an acute pain that involves suffering much less, overall, than the chronic pain you will suffer trying to figure out how to hack your documents such that another access path becomes available. Sure, that's just my assertion, my belief -- but it's a belief shared by many.

Note as well that your upfront schema is not permanent. To change a relational schema, write a migration.