Conventions

Conventions are patterns or practices that we follow at TAS when defining APIs and developing apps.

Replication

Replication is a pattern that allows one or more replication secondary apps to maintain a real-time copy of the master data held on a single replication primary app.

For example:

the replication primary app might be an ATS that produces an API like GET /jobs
the replication secondary app might be a job board that wants to maintain its own local database of jobs, kept in synch with the jobs held on the ATS

You shouldn't use replication unless you need to. In the example above, the job board might be able to simply call GET /jobs each time a candidate visited the site. However sometimes replication is required.

The TAS core itself is unaware of the concept of replication. Replication is simply tenant API calls between apps as far as TAS is concerned. Your app can approach replication in any way it wants - there is no need for it to follow this pattern. However if it does, your app will be more likely to interoperate with other apps.

Features

The replication pattern described here:

Is real-time (non-polling)
Supports partial replication - the secondary can choose to maintain a subset of the instances of the master records (e.g., only jobs that are currently open) or a subset of the properties on the instances (e.g. only the job's title, and not its description or attached documents).
Relies on a bulk load phase, where immediately after install, the secondary gradually loads up all of the master data that already exists at the primary. The pattern is best suited to a single-threaded implementation.

Overview

Following the standard replication pattern, the ATS and job board apps in the example above would work together as below:

A tenant has already installed the ATS app (the primary)
The tenant installs the job board app (the secondary)
The job board app starts its bulk loading phase
The job board repeatedly calls GET /jobs/{} until it has loaded all of the existing jobs from the ATS
Since the bulk load phase might take hours, or days, the job board keeps track of the most recently loaded job in a persistent store (such as the repstate app), so that it can pick up and continue the bulk load phase if the tenant is disrupted, or the app itself is restarted
Eventually the job board's bulk load phase is complete
The job board now listens for incoming alerts about changes to the master set of jobs
At some point, a new job is created inside the ATS
The ATS sends a "ping" to the job board to alert it of the new data
Unless the ping is for a delete, the job board calls GET /jobs/{} to fetch the new/updated data, and updates its local database

Example API flow

Below is a detailed message sequence diagram showing the API flows between replication primary and secondary and TAS.

In this example, the replication primary is an ATS holding the master set of candidates, and the replication secondary is a new candidate search app called "ferret".

This example also shows the use of the repstate app, which acts as a persistent store for the secondary's state during the bulk load phase.

participant ats participant ferret participant repstate participant TAS note left of TAS: tenant acme installs the ferret app TAS->ferret: POST /tenants/acme note left of ferret: ferret knows it is a replication\nsecondary for candidates,\ninitializes the replication state store ferret->repstate: POST /repstates/ferret/tas/%2Fcandidates\n{"loading"":true,lastLoaded:null} note left of ferret: ferret asks for id of first candidate ferret->ats: GET /candidates?$orderby=id&$select=id&$top=1 note right of ats: ats says 10234 note left of ferret: ferret calls its own onPing() method\nwhich fetches full details from the primary ferret->ats: GET /candidates/10234?$select=resume note right of ats: ats passes back resume ferret->repstate: POST /repstates/ferret/tas/%2Fcandidates\n{"loading"":true,lastLoaded:10234} note left of ferret: ferret asks for id of next candidate ferret->ats: GET /candidates?$orderby=id&$select=id&$top=1&$filter=id gt 10234 note right of ats: ats says 10235 ferret->ats: GET /candidates/10235?$select=resume note right of ats: ats passes back resume ferret->repstate: POST /repstates/ferret/tas/%2Fcandidates\n{"loading"":true,lastLoaded:10235} note right of ats: ..time passes.. note right of ats: a database trigger fires in the ats,\nreflecting that a new candidate has been created note right of ats: this is the first time that ats\nhas tried to broadcast\nto this API for tenant acme,\nso asks TAS who produces that API ats->TAS: GET /tenants/acme/routes/ats/tas/%2Fm%2Fcandidates%2F%7BcandidateID%7D%2FdeltaPings note left of TAS: TAS says:\n"items": [{"producer": "ferret",\n"location": "https://acme.ferret.com/"},..] note right of ats: ats now sends message to say\nthat a new candidate has been created ats->ferret: POST /m/candidates/29046/deltaPings\n{"operation": "insert"} note right of ferret: to decide whether to ignore the ping,\nferret needs to know the current\nstate of the replication secondary.\nIt could have this in memory or it\nit may be easier for the API handler\nto fetch it from the replication\nstatus store. ferret->repstate: GET /repstates/ferret/tas/%2Fcandidates\n{"loading"":true,lastLoaded:10235} note left of ferret: ferret onPing()\nif (!loading || id <= lastID) absorbPing();\nignores the ping note left of ferret: ferret asks for id of next candidate ferret->ats: GET /candidates?$orderby=id&$select=id&$top=1&$filter=id gt 10235 note right of ats: ats says 404 error note left of ferret: replication bulk load is complete ferret->repstate: POST /repstates/ferret/tas/%2Fcandidates\n{"loading"":false,lastLoaded:null}

APIs required for replication

As seen above, for a given types of master data (e.g. jobs), replication requires the following APIs:

Produced by the primary and consumed by the secondary
- Get a master record by its "primary key", e.g. GET /jobs/{jobID}
- Get the master record with the first/lowest primary key value, e.g. GET /jobs?$orderby=id&$top=1
- Get the master record with the next primary key value, e.g.: GET /jobs?$filter=id gt 100&$orderby=id&$top=1
Produced by the secondary and consumed by the primary
- Alert secondaries of a change to a master record, e.g. POST /jobs/{jobID}/deltaPings.
  (The primary must queue pings in the event of any secondary being unavailable, until it becomes available again - it might use something like a broadcast service to achieve this).

Primary key properties

The primary key used by the primary must:

be immutable
be of either integer or aphanumeric type
use consistent sorting [specify], so that the secondary can do key comparisons client-side

Error handling in tenant APIs

http status codes

Where possible, when indicating errors, API producers should document and use existing, meaningful http error codes.

For example, 409 Conflict could be returned if a consumer tried to create object "foo" but such object already exists.

If there is no suitable code, APIs should just use 400 or 500 as appropriate - we don't currently define new http status codes.

The RAML for the API should document the superset of http status codes that an API may produce - no app producing the API should return any other statuses.

application/problem+json responses

In addition to the status code, APIs responding with errors (other than self-explanatory ones such as 404) should also return a body of type application/problem+json as per Problem Details for HTTP APIs.

The application/problem+json format uses the "type" field (a URI) to identify the type of error.

For predictable error cases (e.g. create a job application fails because the job is closed), the API documentation should specify actual values for type. At TAS, we start our types with http://constants.talentappstore.com/httpProblems/. For example:


/applications:
  post:
    responses:
      200:
      500:
        body:
          application/problem+json:
            schema: !include ../schemas/applicationProblem.json
            description: |
              The app producing the API should return one of the following values in the type field where appropriate:
              - http://constants.talentappstore.com/httpProblems/jobClosed - the job is closed 
              - http://constants.talentappstore.com/httpproblems/tooLate - its after midnight

Apps may also throw errors with undocumented values for type (obviously the consumer won't be able to take any specific action in this case).

Constant resource representation in tenant APIs

By convention in TAS's tenant APIs, resource representations are consistent (for a given media type).

That is, an API call like GET /positionOpenings (for example) returns the same data to all callers, regardless of the consuming app (or of the principal in the case of OAuth APIs).

This does not imply that all callers can always access the resource - in OAuth in particular, some principals might receive 403 errors depending on business rules. However if a caller can access the resource, they will see the same data as any other caller.

This convention helps with scaling by allowing caching *behind* the authorisation layers in a layered architecture, for example:


API consumer calls GET /positionOpenings/10334
   -> (internet)
      -> SSL offload
         -> auth
            -> business rules checking (e.g. can this principal access this resource?)
               -> cache/ reverse proxy (resource representations are constant, hence can often be cached)
                  -> origin API server (if not in cache)

Exception

The exception that proves the rule is when an API ends in "/me". The responses of such API calls should not be cached.

Tenant-specific web resources

Its useful for apps to keep tenant-specific (i.e. not base domain) resources under "/t". That way the app can apply blanket handling of the tazzy-tenant header and refuse any requests without it.

URLs for complex resources

We try and follow something like this pattern for urls of complex resources.

Class - broad family for the resource, e.g. buttons
Who - principal type who can view the resource + actual viewer, e.g., me, anonymous, byID/{id}
Where - significant location, e.g. general, /jobs/{job} - possibly implied by who  
What - specific resource type, e.g. possibles, meta, omitted where obvious
Which - further filtering (over location), e.g. byName/{name}, byApp

e.g.: /items/toCandidate/me/jobs/{job}/itemMetas/byName/{item}

Class - /items
Who - /toCandidate/me
Where - /jobs/{job}
What - /itemMetas
Which - /byName/{item}

SSO conventions

This section has some conventions that most apps that use SSO should follow.

Add the isSignedIn parameter to outgoing links

Some apps have pages that work both when the user is signed in, and when they are not.

For example a career site may have a job landing page like:


https://acme.careersiteapp.com/jobs/100

When the user is not signed in, the page displays an apply button.

When the user is signed in, the apply button changes to "You've already applied". The page also detects that they are an employee, and a "refer a friend" button appears.

A problem occurs when the user follows a link from one app, where they are signed in, to another app, where they are not signed in (but could be instantly thanks to SSO). For example:

The candidate surfs to the careers site, then into a specific job, then clicks apply. They are redirected to the apply app.
They authenticate and apply.
Finally they click back to the careers site app. The career site app does not know that they are signed in, so it cannot display its contents intelligently (e.g., displaying "you've already applied" instead of an apply button).

One solution to the above would be for the career site to always ask everyone to log in before viewing the page, but that would be a barrier for non-signed in users, and cause SEO problems.

So to handle this we follow this convention:

Whenever an app emits a link which it believes is to another app, and there is a signed in user, it appends the "isSignedIn" hint to it.

The hint indicates to the destination app that the user is most likely already signed in, so it is probably OK to ask them to authenticate since we have good reason to believe the process will be instant and invisible (thanks to SSO).

In the example above, when the apply app links the user back to the job on the career site via acme.careersiteapp.com/jobs/100, it appends the isSignedIn hint as follows:


https://acme.careersiteapp.com/jobs/100?isSignedIn=candidate

Add a filter to handle the isSignedIn parameter on incoming traffic

Apps that use SSSO should incorporates a isSignedIn filter, which behaves as follows:


for all requests with the isSignedIn query parameter
   if the parameter's value == the principal type for this app
      // the user is likely signed in
      force the user to authenticate before visiting the page (with the isSignedIn parameter stripped off)

Tracker session behaviour

Secure redirect

/t/secure/redirect?redirectPage={uri}

NOT KNOWN THAT THIS IS NEEDED OR DESIRABLE. A redirect (http 302) hidden behind SSO. users arriving here must log in, then are immediately redirected to redirectPage. The redirect uri must start with "/", i.e. not be absolute - this prevents an attack where someone puts their own dodgy "enter your credit card details" page behind your app's login to make it look legit. Apps that allow user generated content (e.g. allow someone to host their dodgy pages on the app) should probably not have this resource.

Patterns

SSO-enabled apps might want to use some of the following patterns for their pages:

Login required?	Implementation details
Never	White-listed in the proxy's SSO settings
Always	Not white-listed in the proxy's SSO settings
loggedIn aware	All pages are typically subject to the loggedInFilter described above
Conditional	e.g. job details page requires login if the job is only visible to internals, so we can see if candidate is internal: `(inside rendering code) if [something indicates that] login required if tazzy-saml request header is not present redirect to /t/secure/redirect?redirectPage= {current page URL, urlencoded} .. else .. render normally`
Conditional, but fail if not logged in	Resources that may require login but don't redirect to SSO, e.g. small buttons intended for rendering inside iframes that don't provide enough room for the IdP disco panel and just display as gray if candidate not logged in. `(inside rendering code) if [something indicates that] login required if tazzy-saml request header is not present render as gray .. else .. render normally`
Embedded login	Resources that render with or without login, but embed the login panel (IdP discovery service) if not logged in. e.g. a job application form that displays but with fields grayed out if candidate is not logged in. `(inside rendering code) if tazzy-saml request header is not present embed an iframe with src=/t/secure/redirect?redirectPage= {current page} when the iframe is logged in it redirects the top level window else don't embed the iframe`

Correlation ID in tenant APIs

To quote from :

Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash).

Use the same "X-Request-ID" request header used by Heroku, RoR and others.

/{ct}/{ca} (consumer) and /{pt}/{pa} (producer) segments

This needs work! Ignore for now.

Using /{ct}/{ca} (consumer) and /{pt}/{pa} (producer) segments within a tenant API's URI template is a signal to any app producing or consuming the API that to be valid, API calls should have the (consumer) == the actual consumer, and (producer) == the actual producer.

This means a tenant can install an app like a job board that consumes (say) the API /jobs/byConsumer/{ct}/{ca}, confident that (as long as either (a) the consumer app obeys the convention, and/or (b) the producer enforces it) the job board will only ever access its own postings, and not the posting for some other job board.

Examples

Given that the app "zambo", from the developer "zamsoft":


- consumes /jobs/{ct}/{ca}
- produces /jobs/{pt}/{pa}/deltaPings

...the convention means that, when installed by the tenant acme, zambo (and only zambo):


- can consume		GET /jobs/acme/zamsoft/zambo
- can produce		POST /jobs/acme/zamsoft/zambo/deltaPings

.. and that zambo:


- can not consume	GET /jobs/acme/zamsoft/fruitbat
- will never produce	POST /jobs/acme/zamsoft/fruitbat/deltaPings

In another example, the repstate app provides a centralised store where apps that act as replication secondaries can store the position of where they are up to in the bulk load phase of replication. Other apps can look in and summarise the readiness of all of the replication secondaries the tenant has installed.

Assuming the replicator cursor app:


- produces /replicationStatuses/{ct}/{cad}/{ca}/{apiDev}/{apiURI}

...then, when installed by the tenant acme, zambo (and only zambo):


- can consume		POST /replicationStatuses/acme/zamsoft/zambo/tas/requisitions
- can consume		GET /replicationStatuses/acme/zamsoft/zambo/tas/requisitions

.. and zambo:


- can not consume	POST /replicationStatuses/acme/zamsoft/fruitbat/tas/requisitions
- can not consume	GET /replicationStatuses/acme/zamsoft/fruitbat/tas/requisitions

NOTE: this approach causes tighter coupling since the producer is changing its behaviour depending on who the consumer is. That implies the tenant can't just rip and replace the consumer (just as they can't once the consumer contains its own state).

Categories

Tree semantics - jobs, locations
--------------------------------
The TAS tenant APIs have the concept of categories - a hierarchical system for categorizing jobs and candidates, which allows:

- searching for jobs that match candidates
- actively (candidate searches via UI)
- passively (candidate gets email when new jobs are posted that match their own profile)
- searching for candidates
- that match jobs
- by the tenant, when searching candidates
- by the tenant to create a search agent for candidates, possibly immediately after searching

For some categories, jobs are restricted to only having a single leaf node (category value).

General principles
------------------
Each category is an ordered list of one or more trees.

Within the trees, any node that has no children is a leaf. A node with children is a folder. A leaf becomes a folder as soon as a leaf is added to it.
A folder becomes a leaf when the last leaf is removed from it - i.e. a folder can never be empty.

There can be any number of trees, each starting from a single root node. The smallest possible category is a single leaf (i.e. a single tree with a root node having no child nodes).

A selection is a combined set of explicit folder selections and leaf selections. If a selection contains an explicit folder selection, that implicitly
selects all of that folder's leaves and sub-folders.

Examples
---------

Job
---
Store jobs
Support office jobs
IT
Marketing

Location
--------
Store locations
Asia
..
EMEA
Support office locations
Australia
..

Work type
---------
Store work types
Part-time
Support office work types

Remuneration
------------
Store roles
Support office roles
Hourly rate
Salary
20K - 30K
..

Example tree
------------
The following tree is used in the examples below:

/a
/b
/d
/e
/c
/f
/g
/h
/i

Normalizing selections
----------------------
A set of category values should be "normalized", which specifically means that:
- any folder or leaf must be selected (explicitly or implicitly) only once - i.e., it is invalid to select both a node and one its ancestor nodes.
- the minimum possible number of folder and leaf selections must be used - i.e., it is invalid to explicitly select all of the leaves
beneath a folder without any differing details - the folder itself should be selected instead.

The following are correctly normalized:

/a
/b,/f
/a,/h

The following are not:

/b,/c (since /a would be more minimal)
/a,/b (since /a implies /b)
/d,/e (since /b would be more minimal)
/f (since /c would be more minimal)

A selection that was valid can become invalid due to changes in the master data. For example:

/d becomes invalid if the master data changes to:

/a
/b
/d
/c
/f

(Since /b would be more minimal).

Unprofiled subjects and searching
---------------------------------
When searching, it is desirable, for any given category, to match subjects (i.e. candidates and jobs) that have no values for that category.

This allows the tenant to add new factors over time (e.g. work type), without instantly filtering out all existing candidates and jobs from searches (at least until they have profiled
themselves against the new category), and it also allows candidates to have "no selection" for factors they don't care about (e.g. work type).

To achieve this, searching follows a brute force rule that if a subject has no values for a category, then at the point of being filtered, they are given temporary selections of
the root node of every tree in the category.

For example, if Fred has no values for the example category, then instead Fred is treated as if he has:
/a,/g

There are shortcomings with this, e.g. a candidate who is profiled as Job == Gardening Center and Location == East Brunswick Support Office could be said
to have no selection for location, since there are no gardening jobs at the support offices. But there are no simple solutions so this is the best we can do.

Merging a profile onto an existing multi-valued subject (candidate, recruiter saved search)
--------------------------------------------
Where the candidate has no profile:
The profile is applied, even though it effectively shrinks the candidate's search presence

Aggregating specificity across a number of root nodes
-----------------------------------------------------
The individual specificities are all calculated, then the maximum is used.

* and then across numerous factors *

General searching/matching
--------------------------
Factors are AND-ed.

Selections within individual factors are OR-ed.

Searching with an entire factor omitted
---------------------------------------
Generally, search subjects that have an empty selection are treated as if they have just the root node selected,
which means that they will match any search.

In more complex organizations, trees cannot be treated this simply, since things like location behave as a distinct set of subtrees. For example, only
certain locations are relevant for the Position Type "Assistant
Gardening Team Supervisor". In this case, the empty selection refers just to part of the tree being empty.

ISSUE - should an artificial "ALL LOCATIONS" node be injected into the distilled location tree in a search UI when searching for "Assistant Gardening Team Supervisor" candidates?
If so, how is selection of that artificial node represented:
- on the wire, in parms to the "search candidates" API?
- on the wire, in parms to the "profile candidate" API?

If the answer to the above is "as an empty location selection", then are we moving towards allowing multiple root nodes? Since there is no need/use in having a single root node just to act as a "select all" actor?

Specificity
-----------
Specificity, a measure of how specific a selection is, where a lower number means the selection is more specific, is simply the count
of explicitly or implicitly selected leaves.

For example, a candidate who has selected a single job has a specificity of 1, whereas a candidate who has selected the root node, thus explicitly
selecting all 213 leaf nodes, has a specificity of 213.

Specificity is useful in ranking search results - for example a candidate who has selected only "butchery roles" is
a more likely fit than, and should appear in search results before, someone who has "all megacorp roles".

Hiding nodes
------------
Sometimes, a hiding selection can be applied to a tree. One example is hiding old, disused business groups that need to remain
in the tree but should not be selected for new positions.

A hiding selection must be normalized as per the same rules as a selection - e.g., hiding both a node and one of its ancestors is invalid.

Applying a hiding selection to a tree gives the result set. By "applying", we mean:
- for each hidden node
- remove all direct and indirect descendant nodes
- remove the hidden node itself
- moving up through the ancestors, remove any ancestor that no longer has any child nodes

The last step above ensures that all nodes in the result tree are of the same type (leaf or folder) as they were before hiding. Without this step,
hiding the only leaf in a folder would result in the folder transforming to a leaf.

ISSUE: WHAT ABOUT stripping out ancestor folders that have only one child after hiding?