How to make a custom python templating engine

Question from Anonymous

I have now used a language I am barely familiar with (Python) (I don't like the scopes) to parse a webpage I built off of online tutorials and it's shitty embeds of python so I could generate a new page that has the current information from Google Calendar and presentation slides from Dropbox, neither of which are APIs I had experience with. I have the shittiest code ever written

<div id="SlideshowContainer">
    ${"\n\t\t\t".join(getImageFiles(imageList))}
</div>

a small sample of the HTML part.

But I don't like Python, it's just what I'm required to use

I have the code execution working, it's basically my own implementation of templates

Python scopes trip me up for no reason, as well as types Holy wars of editors don't ever change my opinion; I just use Visual Studio because I normally use C#, but I switched to IDLE for this one, because I was hoping it would be able to work better for Python (it doesn't)

What exactly do you mean by scopes though? like, declarations of things?

if thing:
    x = 2
else:
    x = 3
# x exists here

Declarations in particular, but then I always forget the global keyword in particularly

Wait, that's a codesmell. In 5 years of python ive never needed to use the global keyword more than 1 time and even that was a mistake.

I've had to use nonlocal for one assignment - but this is indicitive probably of a larger problem in how you write code.

No, Python just feels like shitty scripting IMO and I treat Python like I treat a one shot bash/batch script.

(my main interest here is just to get to the root of why you feel this way)

(not to knock the case of one off hacky scripts - it's pretty good for that)

But I only ever need Python for either scripting or small programs

It's only in use here because it's the one language I can remotely use that works on Linux

What are the other languages you can use? That might give me some context on what angle you are coming from here.

C# mostly

okay, ill loop back around after I have run your code, but I have a hypothesis here.

Also, if you have any of your C# code that you can share I am curious how you write code with the guard rails of static typing.

I have code that I think kinda doesn't suck, but it does in retrospect

Good enough.

So anyways, first I'm walking through your script. First issue

data = []
for i in range(0, 10):
    data.append(("&nbsp", "&nbsp", "&nbsp"))

"data" tells me nothing about what this is. Why do you have a list of 10 tuples of 3 non-breaking space html escapes? They also aren't the whole thing you need since you should have a semicolon at the end if I remember correctly &nbsp;

command = "value = " + match.groups(1)[0]
print(command)
exec(command, {}, enviormentVariablesInFile)

I think you see this coming, but it is worth pointing out anyways.

exec is never what you want.

Part of what is going wrong here is that, beyond using exec as a shorthand for a calculator basically, is that you are assigning to a variable.

If anything, at least put the value assignment outside of the command you want to run.

But let's look a bit closer at what you are using it for

    pattern = re.compile("\\${(.*?)}")

    enviormentVariablesInFile = {"data": data, "imageList": imageList, "value": None}

    for match in reversed(list(pattern.finditer(output))):
        span = match.span()
        command = "value = " + match.groups(1)[0]
        print(command)
        exec(command, {}, enviormentVariablesInFile)
        value = enviormentVariablesInFile["value"]
        if value is None:
            value = ""
        i += 1
        output = output[:span[0]] + value + output[span[1]:]

The first issue here is pattern. Regular expressions are not readable. You need to pick a name for that thing that describes what it does.

I can't reverse engineer it, so lets pretend it finds unicorns.

    unicornPattern = re.compile("\\${(.*?)}")
    unicornsInOutput = list(unicornPattern.finditer(output))

    enviormentVariablesInFile = {"data": data, "imageList": imageList, "value": None}

    for unicornMatch in reversed(unicornsInOutput):
        span = match.span()
        command = "value = " + unicornMatch.groups(1)[0]
        print(command)
        exec(command, {}, enviormentVariablesInFile)
        value = enviormentVariablesInFile["value"]
        if value is None:
            value = ""
        i += 1
        output = output[:span[0]] + value + output[span[1]:]

Now at least there is a name to the thing.

Also, I guess I get the idea. Your output starts as the html file and then you raw exec code in there that you delimited with ${} and insert it as you go.

Because you scan the whole document every time you end up with n^2 behaviour, but that is fine for your project.

But - there has to be a better way (and there is, even sans libraries) If you think of python as only good for you to write in a scripting way you should really be asking yourself: "how exactly would I do things differently in C#?".

In this case the main thing c# is going to prevent you from doing is evaluating arbitrary code since "eval"-ing c# is far less easy to do.

You even use this eval behaviour to define helper functions within your template

${None; getImageFiles = lambda l: list(map(lambda i: '<img class="SlideshowImage fade" src="' + str(i) + '" />', l))}

The key thing here that sucks - which is made more sucky but for a good reason by python requiring whitespace in syntax - is embedding code in templates.

Tools like JSP, java server pages, allow for basically arbitrary access to the context of the code around them.

This is why, even if you manage things well at the start, projects using things that are that permissive tend to get off the rails.

Down a step are things like Jinja (the templating engine flask uses). They allow for you to embed logic - with the conceit that it is sometimes required or helpful for formatting some html or similar , but they do not allow you arbitrary access to the outside scope.

This is what I think you are trying to accomplish with your code since you specify exactly the environment for exec and want to then write code to generate stuff using that environment.

This works except for the facts that

  1. Python is probably too powerful a language to be embedded in a template and
  2. Python is a bad fit syntactically for being embedded in html

The most simple kind of templating, and what I suggest you use for your project instead of what you are doing, is find and replace.

So instead of having your logic be in your template, you compute what you want to put outside of that context and jam it in after the fact without any logic.

So for your html that you want to generate - first things first - lets sub out the variable bits.

<html>
<head>

    <link href="index.css" rel="stylesheet" type="text/css" />
    <script src="index.js"></script>

</head>
<body>
    <div id="Container">
        <div id="ScheduleContainer">
            <table id="ScheduleTable">
                <tr class="ScheduleRow">
                    <th id="ScheduleHeader" colspan="5">
                        <h1>Schedule</h1>
                    </th>
                </tr>
                <tr class="ScheduleRow">
                    <th colspan="3">Name</th>
                    <th>Time</th>
                    <th>Room</th>
                </tr>
                <!--
                for scheadule in all_scheadules:
                     make a table row for that scheadule.
                -->
            </table>
        </div>

        <div id="SlideshowContainer">
            <!--
            for image in imageList:
                make an image on the page for that image
            -->
        </div>

        <div id="LogoBox">
            <img id="LogoImage" src="logo.png" />
        </div>

        <div id="Footer">
            <p style="display: inline" id="DateTimeTime" />
            <p style="display: inline" id="DateTimeDate" />
        </div>

    </div>
</body>
</html>

That's all you want to do. Now the question is "how do I fill in the comment blocks without barfing eval-able python code.

Your first option is the Jinja approach where your templating language has support for basic looping constructs. You can give it the info and it will format that on the html page.

But you are rolling your own, so we will go with the second approach - find and replace.

First, lets handle the rows

def schedule_row_html(row):
    return """
        <tr class="ScheduleRow">
            <td colspan="3" class="ScheduleName">{name}</td>
            <td class="ScheduleTime">{time}</td>
            <td class="ScheduleRoomNumber">${room_number}</td>
        </tr>""".format(name=row["name"], time=row["time"], room_number=row["room_number"])
all_schedule_html = "".join([ schedule_row_html(row)
                              for row in schedules ])

Now we are in a position where we can fill in the rows

pageHtml = """
<html>
<head>

    <link href="index.css" rel="stylesheet" type="text/css" />
    <script src="index.js"></script>

</head>
<body>
    <div id="Container">
        <div id="ScheduleContainer">
            <table id="ScheduleTable">
                <tr class="ScheduleRow">
                    <th id="ScheduleHeader" colspan="5">
                        <h1>Schedule</h1>
                    </th>
                </tr>
                <tr class="ScheduleRow">
                    <th colspan="3">Name</th>
                    <th>Time</th>
                    <th>Room</th>
                </tr>
                {schedules}
            </table>
        </div>

        <div id="SlideshowContainer">
            { images }
        </div>

        <div id="LogoBox">
            <img id="LogoImage" src="logo.png" />
        </div>

        <div id="Footer">
            <p style="display: inline" id="DateTimeTime" />
            <p style="display: inline" id="DateTimeDate" />
        </div>

    </div>
</body>
</html>
"""

pageHtml.format(schedules=all_schedule_html, images=TBD)
def images_html(images):
    "".join([ "<img class=\"SlideshowImage fade\" src={src}".format(src=imageUrl)
                      for imageUrl in images ])

(This is all pseudocode, so the finer points are up to you)

Now you may be asking yourself "but what if my template becomes more complicated?" "just doing string formatting can't scale!"

And to that I say - yeah no duh.

That's why people spent time writing, improving, and bug-fixing the existing templating libraries.

But if your requirements are as simple as you say - a single page regenerated every day or whatever - just do it inline with strings, who cares.

Also, tiny thing.

ChangeImage

The C#/.NET naming convention of every first letter being capitalized isn't used anywhere else. Most javaish people use the camelCase thing. Python supports that too, but the generally preferred style is snake_case.

Doesn't matter for this, but just keep it in the back of your head so when you finally have to code with other programmers you don't get bogged down in pointless holy wars

moving on from the exec thing finally:

    i = 0

    for event in events:
        start = event['start'].get('dateTime')
        if(start is None):
            continue
        start = start[11:16]
        hour = int(start[0:2])
        suffix = " AM"
        if hour > 12:
            suffix = " PM"
            hour %= 12
        start = str(hour) + start[2:5] + suffix
        name = event['summary']
        location = event['location']
        print(name, start, location, sep=", ")
        data[i] = (name, start, location)
        i += 1

What is this i?

It seems like you are just counting in step with the data because the way you coded it requires a set number of schedules on the page. Hopefully you know how to fix that now and you can just append to data (or whatever name you give it that actually represents what it is).

The larger problem with using i like this is that it increases the area you need to read over to understand a given chunk of code since you need to track reassignments and changes and uses of i everywhere from its first declaration to its last.

"i", while customary for simple kind of index based for loops from c-ish languages and sometimes when using range(...), really isn't a good enough name here.

        start = event['start'].get('dateTime')
        if(start is None):
            continue
        start = start[11:16]
        hour = int(start[0:2])
        suffix = " AM"
        if hour > 12:
            suffix = " PM"
            hour %= 12

Also, date handling logic is always going to be messy. I get that. But try and make it depend on less magic numbers.

Maybe isolate it to its own function (maybe, depending on if that helps or hurts readability in context)

What is start[11:16]? Maybe you know now, but god only knows a year from now.

getService you copy-pasted. No problem, but maybe put a link back to where you copy-pasted it from in case you need to change it later.

I would loop back around to tackling your misgivings about python but at this point im tired

Okay, sorry, I had to go halfway through this, and just got back

Thanks for all the help!

I'll try to fix some of the worse problems in this

You're right, I'm not respecting the language correctly

I wouldn't use the verb "Respecting" necessarily. You just need to learn how to write code to be read. I think being in python lowers some guardrails, so you are just bumping into stuff more.


<- Index