Security meets Design-by-Subtraction[1]

green metal gate with brown metal padlock
Photo by Jason Blackeye / Unsplash

While a significant part of an engineering education (or indeed any education) is the gathering of tools with which to attack a problem, something beautiful happens when we intentionally constrain their use. In events such as hackathons and game jams, we intentionally restrict some aspect of creation, whether it be time, tools, theme, some combination of these, or other restrictions altogether. At the end of such events, instead of subpar outcomes, what can emerge (admittedly with some very hard work and sleep deprivation) are inventions and experiences with much more creativity than the resources that went into them would suggest possible. Similarly, when we constrain what can be done with a programming language (or paradigm in this case), we gain non-obvious benefits to authorization code architecture and workflow. Let’s explore how this plays out in the adoption of Open Policy Agent and Rego.

The Problem: Authorization

Authorization is one of those problems that seems really simple until reality starts to get in the way... Let’s look at a simple example and then see how quickly it can get out of hand. Suppose you are an engineer who is programming an HR/Payroll platform — an experience I happen to be intimately familiar with — and you have to write code that determines who can view a person’s salary. In the real world, this is colloquially called a "policy," and is usually determined by laws and decisions that are written in some language, (let’s say English for now) and might look something like this: "People can read their own compensation." As an engineer, we might look at this, and in some theoretical C/JavaScript-hybrid pseudo code, write something like this:

We could then call this function somewhere like:

To properly authorize a "read_compensation" request. However, in the real world, policies often look a lot more like this:

Users can read the compensation of themselves, the people they are a manager of, and the people they are a manager of a manager of, or anyone if they are a member of the "human_resources" team and have the "read_compensation" permission present on their profile, and no one can read the compensation of executives or directors unless they also are an executive or a member of the board of directors, respectively.

Yikes... Well, okay, let’s go back to our program and see what we can do:

So, hopefully I don’t have to explain why this is problematic code, but for starters:

  • Parentheses! Parentheses everywhere! Where do they start, and where do they begin? Can you tell if I have put the correct number of parentheses at each level of hierarchical logic? How long do you have to stare at it before coming up with an answer? What if the policy changes and you have to change the code to match it? Note also that there is no amount of syntax highlighting or tabbing that is going to fix this problem.
  • We have tried to offload some of the complexity of this policy to some functions, but these are imperative functions, which can do pretty much anything. They can throw errors, loop for a long time/indefinitely, or they can have side effects like changing some global variable we can’t immediately see. We can’t know what these functions are doing exactly, unless we go to some "manager" module that would exist in a theoretically well-designed and modular code base, but even then, should we really have to? We’re just trying to write some policy code, probably as a part of a much larger PR.
  • This is just one policy! For reading compensation, let alone editing it. What about timesheets? Taxes? Garnishments? Vacation? This if statement would become a huge chain of if-else if-else statements.
  • Similarly, on modern distributed system architectures, there may also be several languages involved, multiplying the policy code confusion further!

Now, before continuing, I can hear someone out there thinking, "Hey, Nick, that’s not fair, you smushed all the conditions together in that return statement! Good coding standards would dictate that you expand them all out into intermediate variables!" Fine, here you go:

That’s certainly better, but still not great. The last three problems still exist, and if someone had to tell me to refactor, they will likely also have to tell our theoretical programmer who is just trying to get a policy written for their theoretical new feature to put in their theoretical PR.

So, what has been the modern answer to this so far? The next step is typically to go data-driven as opposed to code-driven, with something like Role-Based Access Control, (RBAC) or Attribute-Based Access Control... In these schemes, we have some sort of common labels or tags that we attach to principals, like "Manager," "Executive," "Director," and/or "HR" roles, and something like "HR Viewable," "Confidential," and/or "Top Secret" to entities, and then define 3-tuples that relate them with an action, like ["Executive", "edit", "Top Secret"] or ["HR", "read", "Confidential"].

The problem here is that we have traded complex code for a limiting data representation. What if we want a certain principal to have access to a certain entity? For example, let’s say we want "Jack" to be able to view "Jill’s" compensation (and vice-versa) because they are married? We would then have to make a specific role or tag just for their situation and apply it to their authorization representations, or we would have to make a special case directly in the authorization code, which really should only be evaluating the data in this case. Not to mention that representing structures like managerial hierarchies would be extremely difficult if not impossible with a purely data-driven approach.

First Pass: Prolog

So, enough of what doesn’t work, let’s look at what logic programming (LP) can offer, shall we? For the uninitiated, LP takes a declarative approach to evaluating a set of facts with a set of rules. There isn’t actually any "computation" being instructed in the traditional "imperative" sense, but instead LP is an evaluation engine that uses logical constraints to determine which output (if any) will satisfy a query. By definition, no reassignment of variables or mutation of state can occur, and LP gives us purely analytical tools that we can use to evaluate authorization queries in safe and straightforward ways. Before we get into the actual code, please note that a detailed tutorial on the LP languages featured here unfortunately falls outside of the scope of this post. However, where appropriate, I have tried to add descriptive comments to the LP code below to help readability. That being said, let’s start with Prolog, which is the darling LP language of academia, where to represent the policy from earlier, we might have something like this:

Feel free to play with this code here, but for now, just looking at this briefly, there are several improvements:

  • Hierarchical logic is offloaded onto identifiers by default with no need to refactor or remember to write it that way to start.
  • Since the input to the evaluation is known to the program, there is no need to call functions with unknown consequences from another module.
    • NOTE: This is not free, as it requires serializing the input query before sending it to the LP evaluator, but this is generally trivial to implement, and has the added benefit of creating an easily auditable log of the information the decision was based off of.
  • We have the benefit of flexible logical expression evaluation without the rigidity of a purely data-driven approach, so we could just as easily add a rule like:

Which would exist right in the same place all the other policies are found, without needing a special role/tag, and without needing to put it in as an exception in the actual authorization code.

but wait theres more

Just as we can make simple queries to the Prolog program above like:
authorized(alice, read_compensation, charlie)
which will evaluate to true, and
authorized(alice, read_compensation, bob)
which will evaluate to false due to director privilege, we can also utilize advanced LP features like variable querying to perform queries like:
auhthorized(bob, read_compensation, Entity)
Which asks, "who can Bob view the compensation of?" The result of which is Entity = charlie, Entity = danny, Entity = ellen. This information would be very useful for debugging/verification on both the side of the programmer, and on the side of the user, if indeed the user has the ability to affect some of the policies with their own configuration, it would be nice to see exactly how it affects the access of actual principals and entities.

Before moving on to OPA, let’s take a look at what we did here: instead of seeing a problem and just throwing a general purpose imperative language at it like C++ or Java, we instead chose a language that instead limited us from using potentially dangerous features, but was still able to accommodate all of our needs in a way seemingly purpose-built for our requirements, and had extra benefits on to top it off.

Final Draft: Open Policy Agent

While Prolog could theoretically be used for authorization directly with some modification, it lacked support for hierarchical data structures like JSON. The team at Styra saw an opportunity to create a purpose-built language that would make policy-based access control more developer-friendly. Specifically, remember when I said we had to serialize input queries before evaluating them in a LP language, and that it’s generally trivial to do so? Well, that wouldn’t necessarily be the case if we had to serialize query inputs to a set of facts in the form that Prolog expects... After all, what kind of modern data representation actually looks like this?

More than likely, we will have some kind of SQL DB that we interface with via some RESTful JSON HTTP API, or a NoSQL DB, which is probably already using JSON, or even some GraphQL data, again in JSON, and so a theoretical trivially-serializable application object might instead look something like this:

Where we have a principal with an id, which also has managerial relationships to other entities with ids, and then can can we bundle a couple of these together with an action to create a complete authorization query, like this:

And here we have Styra’s motivation for creating Open Policy Agent (OPA) and its Datalog-derived logic programming language, Rego. (Datalog is a fully-declarative LP language derived from Prolog) OPA’s advantage is that it can perform JSON authorization queries via Prolog-like policies (minus the arbitrary/unsafe LP variable queries mentioned earlier) blazingly fast and with comprehensive auditing features. OPA also has an extensive built in library of functions with support for common tools like JWT, GraphQL, and base64, as well as the usual operations for sets, strings, time, etc. With this in mind, an equivalent policy in Rego might look something like this:

Feel free to play around with this policy and the accompanying input in this playground. In the meantime, we finally have something that can be used in practice, and also has the flexibility to be able to accommodate ad-hoc policies if necessary. For example, if we wanted to add a rule that allows for one specific employee to read another’s compensation, we could easily add a rule like this:

So, we now have all the code organization and safety benefits of Prolog, if not quite all the features, but with the addition of JSON support, fast execution, and superior auditing through services like Styra DAS. Let’s review the benefits:

  • Decoupling of authorization code from application code
  • Stateless and side-effectless evaluation of queries...
  • ...Without sacrificing complex hierarchical logic conditions...
  • ...Or modularly-organized rules/helper functions
  • Retained ability to express typical RBAC/ABAC policies
  • Auditing and debugging authorization query results via services like DAS, and features like code coverage, to see exactly which policies resulted in a given outcome

Just as constraints can squeeze creativity out of a game jam or hackathon, they can also create a beautifully organized and structured authorization code/policy base. With OPA, we get all the constraints and safety that we would want from a data-driven authorization approach without the loss of flexibility that it would impose, giving us the freedom to implement both simple and complex policies, as well as generic and more specific policies, all in a self-contained, auditable package. This also allows describing policy not just for traditional authorization, but things like infrastructure, Kubernetes admission control, or CI/CD checks.

Appendix: Recursion

Insightful readers will notice that my examples above are a bit limited in that they only work for a hard-coded depth of managers. I did this to simplify the example for illustrative purposes, but never fear, there is indeed support for arbitrary-depth hierarchical structures in both Prolog and Rego. Prolog has the ability to use a more traditional language-based recursion approach, while Rego has a more data-oriented recursion approach with a helper function.

  1. In game design, there is a school of thought known as "design by subtraction," popularized by the designers at Team Ico with their titles "Ico" and "Shadow of the Colossus." In an analogy similar to the game jams and hackathons in the first paragraph of this post, "design by subtraction" seeks to eliminate any complexities that are not completely necessary to achieve the desired goal. ↩︎