Part 3 - Authentication & Authorization

Jan 18, 2023

There are 2 separate sessions to FeatureGuards as depicted in Part 1 - Building A Scalable Web Service. We will cover how each session is authenticated and authorized.

1. Control Path

Here, a human is creating an account (or logging in) with FeatureGuards at app.featureguards.com. Upon which, they will be able to create a Project, invite others to the project (or be invited), create feature flags/dynamic settings, API keys…etc. This is the admin side of FeatureGuards and we’re authenticating humans, rather than machines.

Authentication

Instead of writing all user management routes (i.e., password recovery, email verification, welcome emails), I used something called Ory/Kratos and ran it as part of FeatureGuards infrastructure. It’s a service + database schema to manage flows (signup, login, 2FA, forget-password…etc) and rendering UI flows without being opinionated on how the UI should look like. So, you can pick your favorite framework and render the UI however you want it to be. They provide both ‘browser’ APIs and ‘admin’ APIs with clear documentation on how to use their APIs safely. You can follow the code for login on the frontend.

We host the Ory container alongside the dashboard and delegate all /identity/* routes to it via Envoy config. The frontend knows to use the same prefix for all of its Ory APIs.

Once a session is obtained via Ory, a secure cookie called app.sid is created, which is passed to the Dashboard service as gRPC metadata extracted from HTTP headers. Authentication is done at the middleware to ensure every request is authenticated. Every gRPC method that requires authentication checks for the cookie and converts it into a session that is set on the context if we’ve validated the email or the method doesn’t require email validation.

NOTE: Email validation is quite critical. You must validate every email to avoid a class of security issues, such as account hijacking…etc.

Authorization

All authorization on the control path is done at the project-level. There are 2 roles: admin and member. FeatureGuards implements a role-based access control (RBAC) model. If you’re a member of the project, then you can create feature flags, update them, delete them, …etc. But, you can’t delete a project, invite new members, delete existing members, of even see who the members are.

Authorization is done at the gRPC layer. Every method exposed does an authorization check on the project, feature-flag, or environment that is passed in and the authenticated user. Lower layers (i.e., data model) don’t use authorization. This is to help with testing and allow super admins, async-tasks…etc to do operations on behalf of users.

2. Data Path

FeatureGuards exposes APIs and SDKs in Python, Go, and Typescript to be used by clients to use the actual feature flags created in the control path. This is the most critical flow and needs to be highly available and performant. Here, we will discuss how clients are authenticated and authorized.

API Keys

Any member of the project can create an API key/secret with an expiration date optionally. An API key is associated with a project, platform (i.e., browser or server) and environment (i.e., prod or QA).

Authentication

Instead of passing the API key on every request, FeatureGuards instead follows a different flow by leveraging JSON web tokens (JWTs). The Auth service exposes Authenticate, which expects x-api-key to be passed in gRPC metadata. Once it authenticates the request based on the passed in API key/secret pair, it issues both access and refresh tokens.

The access token is short-lived and contains a signed blob that contains additional claims, such as the environment or platform associated with the API key. The access token is used to authenticate every API request to the Toggles service, which serves gRPC requests for feature flags. The main advantage of using access tokens is that we can potentially authenticate requests without doing any DB operations by just checking the claims and validating the signature on the access token. This is a big performance boost.

Unlike access tokens, which are short-lived (15 min), refresh tokens are much longer-lived (7 days) and are used to obtain access tokens periodically by the client. The server can issue access tokens from a valid refresh token by validating its signature, but what happens if an API key is revoked? We somehow need to revoke the refresh tokens associated with it? Also, what happens if someone steals a refresh token? They can indefinitely issue access tokens.

To solve this problem, there is a clever solution pointed out by Auth0. The high-level idea is that if each token is allowed to be used once, then upon detecting a re-use we can identify a potential compromise and revoke all refresh and access tokens issued and will be issued by the descendants of the first refresh token obtained from Authenticate. This requires us to maintain additional metadata in the token to point to the root refresh token and check for any compromised root refresh tokens in Redis.

Notes:

For long running requests, such as Listen, which returns a blocking server-side stream, we MUST authenticate the client periodically. FeatureGuards solves this by gracefully terminating the Listen request based on the expiration of the JWT access token associated with the request. A better solution would be to use a bi-directional stream and have the client send periodic auth requests, but this wasn’t done because gRPC-web doesn’t support bi-directional streams.
If an API key is revoked via deletion, then other requests will fail because the API key object won’t exist and we won’t be able to retrieve the project ID associated with the request.

Authorization

Authorization here is pretty simple. Each gRPC request parses the JWT access token, retrieve the API key and does a look-up to obtain the project and environment associated with the API key. The project ID or environment are NEVER passed in separately and are always retrieved from the key. Hence, authorization is enforced based on the API key.

It should be noted that there are various performance optimizations that we will discuss next to avoid making database queries for the API key on every request.

Infrastructure tales

Discussion about this post

Ready for more?