A brief primer on web apps for ITM students (part 2)

2024-02-02

This is the sequel to this other article. My goal in this one is to discuss routing and controllers.

Hypertext

Before we discuss routing, we should discuss hypertext, which is what gets sent over the network when you talk to a web server. I dislike starting with a definition, but in this case, it's important:

Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. - Wikipedia

It's just text. Within this text, we once again we come across the concepts of content and links. We've seen this already.

You can find two concepts immediately adjacent to hypertext. First, Hypertext Markup Language (HTML). This is the format of the text itself. HTML uses tags to instruct browsers how to render the text contained within said tags. For example:

Headers like this

are rendered differently from paragraphs like this. If you open your browser's developer tools, you will be able to see the header enclosed in <h6></h6> tags and the paragraph enclosed in <p></p> tags. The HTML itself is just text. It's the browser that renders the text in different ways depending on what tags enclose the text.

Second, Hypertext Transfer Protocol (HTTP). This is the format used to send hypertext to web programs.

There is a program called curl that Mac and Linux computers can use to send HTTP requests to web servers. We'll use it to explore how HTTP works. As a simple example, let's ask Amazon what my current IP is.

I can issue an HTTP request with curl -v https://checkip.amazonaws.com. The -v option tells curl to print the details of the HTTP request and the HTTP response when it arrives.

This is the HTTP request that curl sends to Amazon.

GET / HTTP/1.1
Host: checkip.amazonaws.com
User-Agent: curl/7.79.1
Accept: */*

It's text. The thing that makes this work is that this text follows certain rules -- for example, that the top line must contain the HTTP method (GET), the path (/), and the HTTP specification (HTTP/1.1). The lines that follow are called HTTP headers: key-value pairs that give additional information to the server. A server knows to expect this format from an HTTP client.

This is the HTTP response that Amazon sends us.

HTTP/1.1 200 OK
Date: Fri, 02 Feb 2024 14:07:11 GMT
Server: Not Available
Content-Length: 16
Connection: keep-alive

X.X.X.X

It's text again. The top line here, the response line, is similar (though not the same) to the request line. Instead of the method/path/specification, we instead have the specification/status code/reason phrase. Responses also include headers.

Responses will also typically include a "response body" separated from the headers by a blank line. In this example, the response body is my IP (which I have redacted). In most cases, the response body will be HTML.

Note that this is the barest example of HTTP. It can get far more complicated than this, especially when forms and files are involved.

Thankfully, in practice, you will never construct HTTP requests manually, and you will never parse HTTP responses manually. Even the simplest frameworks will give you more pleasant ways of working with requests and responses.

Routing

Let's pretend, once again, that you are building a website for a cafe at https://example.com, and let's pretend that you have three web pages:

Home, at https://example.com/.
About, at https://example.com/about.
Branches, at https://example.com/branches.

Take a closer look at the top line (the "request line") of an HTTP request.

GET / HTTP/1.1

The second element here, the path, will change depending on what specific page you want to visit. When a user visits https://example.com/about, the request line will change to this:

GET /about HTTP/1.1

You can use the different values of the path to tell your web program to do different things. In English:

If the user visits the path /, render my home page.
If the user visits the path /about, render my about page.
If the user visits the path /branches, render my branches page.

In pseudocode:

if request.path == '/':
    render('home.html')
else if request.path == '/about':
    render('about.html')
else if request.path == '/branches':
    render('branches.html')

This is a form of dynamic dispatch. Different code will run depending on the value of the path variable.

In practice, you will usually attach functions called "handlers" to different paths. In Flask, the Python web microframework, it looks like this:

app = Flask(__name__)

@app.get('/')
def home():
    return render_template('home.html')

@app.get('/about')
def about():
    return render_template('about.html')

@app.get('/branches')
def branches():
    return render_template('branches.html')

Flask uses Python function decorators to determine which function needs to handle a request. Decorators are an advanced Python feature, but they make intuitive enough sense in this case that I think they are a good abstraction. If you know even a little Python, it's reasonably clear what is going on in this code just by looking at it.

Controllers

Using a web framework only to render HTML files is not that interesting. We could have achieved this with a much simpler Apache or NGINX server. One reason to use web frameworks is if you need to run code when you receive a request.

Let's say that you want to add a new page, /products, to your cafe's website. Unlike the home, about, and branch pages, this products page is dynamic: the website administrators should be able to update the table of products on the /products page without changing the source code. To keep things simple, let's also say that they just have a text file that they edit if they want to change their product offerings.

The text file looks like this:

id,name,price
1,Americano,100
2,Espresso,90

It's not quite as simple as rendering an HTML file now; you will need to parse this data. Note that since you're using a web framework, you have full access to the programming language around it. You can thus easily do something like this:

@app.get('/products')
def products():
    # First, read the data.
    with open('products.txt') as f:
        products_headers = next(f)
        products_data = f.read()
    # Then, construct some HTML based on the data.
    # We will do so manually for now. HTML is just text, after all.
    page = '''<table><tr>'''
    for header in products_headers.split(','):
        page += f'''<th>{header}</th>'''
    page += '''</tr>'''
    for product in products_data.split('\n'):
        page += '''<tr>'''
        for field in product.split(','):
            page += f'''<td>{field}</td>'''
        page += '''</tr>'''
    page += '''</table>'''
    return page

While ugly, this achieves our aims. This code reads from some source of truth, in this case by a file. The code then constructs HTML based on that source of truth, in this case by splicing strings together to form HTML.

All this happens in a handler function. We normally call these handler functions "controllers," since they dictate the logic that executes for any given route.

Models and Views

To reiterate the two general tasks performed by our controller:

The controller reads data from some source of truth.
The controller uses that data to form some HTML.

These two tasks are so common that web frameworks usually offer abstractions for them as models and views respectively. We will discuss these in a future article.