A brief primer on web apps for ITM students (part 3)

2024-02-06

This is the third part of my series on web apps for ITM students. You can find the first post here and the second post here.

We've already discussed routing and controllers. In this post, we'll discuss the two remaining "pillars" of web apps: models and views. I'll start with views, which I think is the easier of the two to understand.

Views

A "view" is just what gets sent to the user in an HTTP response. This is usually HTML, but sometimes other data formats like JSON are sent instead.

Remember that HTML is just text. If you so wanted, you could just return a string from your Flask controller like this:

@app.get("/")
def home():
    return "<h1>Home</h1>"

This will work. This very site returns strings constructed in this way.

Problems arise when you start including dynamic data in your views. Let's revisit the example from the previous post.

@app.get('/products')
def products():
    # First, read the data.
    with open('products.txt') as f:
        products_headers = next(f)
        products_data = f.read()
    # Then, construct some HTML based on the data.
    # We will do so manually for now. HTML is just text, after all.
    page = '''<table><tr>'''
    for header in products_headers.split(','):
        page += f'''<th>{header}</th>'''
    page += '''</tr>'''
    for product in products_data.split('\n'):
        page += '''<tr>'''
        for field in product.split(','):
            page += f'''<td>{field}</td>'''
        page += '''</tr>'''
    page += '''</table>'''
    return page

In concept, your view here is a function of some data. It changes depending on what input data you give it. It would be easier on the brain to treat the view in this way: hold the template static, allow the data to be dynamic, and keep the two separate. In pseudo-Flask:

products_data = {
    'headers': ['id', 'name', 'price'],
    'items': [
        [1, 'Americano', 100],
        [2, 'Espresso', 90],
    ]
}

@app.get('products')
def products():
    return render_template("products.html", data=products_data)

You can think of the render_template function here as an interface to the string-building we did in the previous version of this example. Instead of building the HTML string by appending tags and data to some output variable, render_template instead uses a "templating engine" to insert data into pre-defined locations in an existing string. This is similar in concept to using an f-string, but web frameworks tend to have templating engines built specifically for HTML.

Flask uses the Jinja templating engine. To demonstrate what the "products.html" template might look like:

<table>
    <tr>
        {% for header in data['headers'] %}
            <th>{{ header }}</th>
        {% endfor %}
    </tr>
    {% for item in data['items'] %}
        <tr>
            {% for x in item %}
                <td>{{ x }}</td>
            {% endfor %}
        </tr>
    {% endfor %}
</table>

Passing products_data to the render_template function will make it available for use in the template. Despite the syntax, it should be reasonably clear what the logic in the template aims to do.

I will note, for the curious, that Jinja is not bound to Flask, nor is it bound to rendering HTML. I have heard stories of some software teams using Jinja to template SQL. Thankfully, I have not had to do that myself, and I hope that I never have to.

Models

We come, at last, to one of the many reasons why web apps are difficult. I foresee that beginners might find this section difficult to read, so I'll try to summarize the takeaways in a digestible form near the end.

Persistent state

There are two great villains in software engineering: state, which is information that your program "remembers," and concurrency, which is when your program has to deal with many things happening at the same time. On the one hand, as beginners, we can deal with concurrency by simply not doing concurrency. Write all your programs to be sequential, and you will never* have concurrency problems. On the other hand, unfortunately, state is inescapable.

If you need to store your data persistently, you cannot store it in your program variables, so I will dismiss that idea for the purposes of this section. I mentioned before that this website stores its persistent state as Markdown files. This is certainly one way to do it, but like the previous string-building example, it will break very quickly. Most web apps instead choose to use a database, which is a program whose specific job it is to manage persistent data. Databases also store data on the disk, but to put it bluntly, they do it better than you.

There are many databases, and some people like to make the task of picking a database seem daunting. Don't listen to those people. I think you should just use PostgreSQL for reasons that I outline in this post. To summarize: the relational data model that Postgres uses, and is the current best-in-class open-source option for, is much more flexible than competing models like the dictionary** model and the key-value model. Most projects will benefit more from this flexibility than they will from the scalability of other data models. I can also recommend Postgres's smaller sibling SQLite if you don't want to set up a whole server for your database, which should be okay for smaller projects.

The impedance mismatch

There is one problem with Postgres/SQLite that I will acknowledge here. The relational data model stores data as rows tables that sometimes store references to rows in other tables. For example:

sqlite> SELECT * FROM store;
id  name
--  ----------
1   Good Taste
2   Sutra

sqlite> SELECT * FROM product;
id  name           price  store_id
--  -------------  -----  --------
1   Happy Chikie   140    1
2   Fries          30     1
3   Chicken Curry  130    2

sqlite> SELECT * FROM product INNER JOIN store ON product.store_id = store.id;
id  name           price  store_id  id  name
--  -------------  -----  --------  --  ----------
1   Happy Chikie   140    1         1   Good Taste
2   Fries          30     1         1   Good Taste
3   Chicken Curry  130    2         2   Sutra

The core idea of relational data is that you should treat rows as sets, not as individuals, even if you deal with a set with only one element. This philosophy unfortunately conflicts with object-orientation, which is the dominant paradigm of most popular programming languages, including Python. Object-orientation treats instances of a type as individuals that have their own identities and their own mutable state. Most object-oriented languages have libraries in their ecosystem called "object-relational mappers" (ORMs) that attempt to present relational data as objects to applications developers.

The problem that ORMs try to solve, this difference in paradigm, is called the "object-relational impedance mismatch." Over time, you will find that the subtle differences between how you treat data in an object-oriented program versus a relational database will prevent your program from ever coming together in a cohesive way, even if you use ORMs. You can get away with matching tables to classes as long as your data remains simple, but the real world has a way of complicating your requirements. If you want to read more about this problem, the original article about it can be found here.

Without getting too deep into this discussion, the general consensus around this impedance mismatch is to heavily favor one side over the other. My personal solution so far has been to favor the relational side: instead of asking my program to return objects, I ask it to return dictionaries or tuples, which are data structures that do not conflict with "sets of rows" approach of the relational paradigm. Python is thankfully a good language for manipulating these data structures, so we will not have too much trouble here.

I do acknowledge that this usually requires me to either write SQL or use a query builder, both of which I think are poorer experiences as a developer than being able to use objects in a language where objects are idiomatic. You will see later why I think this is an acceptable tradeoff.

We will discuss ORMs anyway

Sorry. Despite the treatment I gave to the impedance mismatch, we will feature an ORM anyway. These tools are too common to ignore, and to be fair to them, if your project is simple, they might actually work for a while.

The preeminent ORM in Python is SQLAlchemy. Its creators seem to be well aware of the impedance mismatch, so they take great care to describe themselves not only as an ORM but as a "database toolkit." I used SQLAlchemy myself when I did my Receipt Scanner project, and I can confirm that its non-ORM features work fine.

Back to Flask

Instead of storing our product data in a text file, we can instead store it in a SQLite file for simplicity.

sqlite> CREATE TABLE product (
            id INTEGER PRIMARY KEY ASC,
            name TEXT,
            price INT
        );
sqlite> INSERT INTO product (name, price) VALUES ('Americano', 100), ('Espresso', 90);

This should give us the following data in our SQLite table;

sqlite> SELECT * FROM product;
id  name       price
--  ---------  -----
1   Americano  100
2   Espresso   90

Now our problem is connecting to this database from within Python. Well, in truth, it's not that much of a problem since Python includes a sqlite3 library in its stdlib.

>>> import sqlite3
>>> cnx = sqlite3.connect('/Users/joeilagan/tmp/db.db')
>>> cursor = cnx.cursor()
>>> cursor.execute("SELECT * FROM product;")
<sqlite3.Cursor object at 0x10ebc6e30>
>>> cursor.fetchall()
[(1, 'Americano', 100), (2, 'Espresso', 90)]

The nature of relational data is now evident. We executed SQL directly against our database and got a list of tuples in return.

What about if we use SQLAlchemy? Let's try to select all our products using SQLAlchemy's ORM.

import sqlite3
import sqlalchemy as sa
from sqlalchemy.orm import (
    DeclarativeBase,
    Mapped,
    mapped_column,
    relationship,
    Session,
)

class Base(DeclarativeBase):
    pass

class Product(Base):
    __tablename__ = 'product'

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(sa.Text)
    price: Mapped[int] = mapped_column(sa.Integer)

    def __repr__(self) -> str:
        return f'''Product(id={self.id}, name={self.name}, price={self.price})'''

engine = sa.create_engine('sqlite:///db.db', echo=True)

with Session(engine) as session:
    stmt = sa.select(Product)
    for p in session.scalars(stmt):
        print(p)

# Output:
'''
...
Product(id=1, name=Americano, price=100)
Product(id=2, name=Espresso, price=90)
...
'''

Hahahahaha. This is what I got after following the new SQLAlchemy 2.0 tutorial for their ORM. In the middle of testing whether this worked, I forgot that I was ostensibly writing an article for ITM students.

A tangent on abstractions

Believe me when I say I didn't intend to produce this mess. I can understand this code, but only because I've worked with SQLAlchemy before. If you're a new dev, working solo on some hobby project, venturing into this ORM space might turn you off of software engineering completely.

ORMs suffer from what software engineers call the "leaky abstraction" problem. You can see that here. The only reason anything gets done nowadays is because some of the ways we hide implementation details, abstractions, are good enough to let us truly avoid thinking about what goes on underneath. It's like most people with their cars: if the car is good, its driver doesn't need to know how its engine works. An abstraction that doesn't do this detail-hiding is called "leaky", and depending on how badly it leaks implementation details, it can actually make your life worse than if you hadn't used the abstraction at all. Keeping the car analogy: a bad car whose engine breaks every two months will force its driver to learn how its engine works against their will, and its driver will hate it for it.

ORMs deal with a particularly hostile problem domain, so a lot of them are leaky. I do think that there's a healthy middle ground with Django, the "heavyweight" Python web framework. When I used it for the first time, I found that it very pleasantly abstracted away a lot of the details that I have to manage in this Flask/SQLAlchemy example. You might want to explore that instead.

Let's push on for now. In theory, after setting up your database like this, you can then use SQLAlchemy's query language along with the class you created as a model to work with your data like this:

@app.get('/products')
def products():
    with Session(engine) as session:
        stmt = sa.select(Product)
        products = [p for p in session.scalars(stmt)]
        return render_template('products.html', data=products) # Assuming the template renders product objects properly

This is what most people would call the "model" layer. I suppose this looks nicer, and I suppose that if you did the legwork setting this up, it might feel nicer, too. I still don't think it was worth it. SQLAlchemy helped me do my Receipt Scanner project, but I don't think it will help you understand the basics of web app development.

My recommendation for newbies is still to use SQL directly until you get a feel for what it means to work with your database. SQLAlchemy isn't a good enough abstraction for me to recommend if you're just starting out. You have to earn your right to use it. Django and its ORM are a different story that I will tell another time.

What now?

Here are some takeaways:

You're not crazy. It's hard to get started.
Remember that most of these things are just text. Remember that most of these tools are abstractions over text manipulation.
Your projects are likely going to be small. Keep it simple in the model. Prefer SQLite to Postgres if you can. Prefer direct SQL to ORMs if you can. I will grudgingly tolerate you exploring MongoDB (especially if we haven't updated your Digital Cafe assignment by the time you read this), but prefer SQL to NoSQL if you can.
Please explore Django, actually. There's a chance you still might find it too overwhelming, but if it clicks, it really clicks. It's worth at least looking at it.

That's all from me.

PS

* - Kidding. You can absolutely find a way into concurrency problems even if you write your programs sequentially. In the real world, things can go wrong in truly shocking ways.

** - I know it's called the document model. I'm trying to avoid using too many parens.

joeilagan.com