Conventions are patterns or practices that we follow at TAS when defining APIs and developing apps.
Replication is a pattern that allows one or more replication secondary apps to maintain a real-time copy of the master data held on a single replication primary app.
For example:
You shouldn't use replication unless you need to. In the example above, the job board might be able to simply call GET /jobs each time a candidate visited the site. However sometimes replication is required.
The TAS core itself is unaware of the concept of replication. Replication is simply tenant API calls between apps as far as TAS is concerned. Your app can approach replication in any way it wants - there is no need for it to follow this pattern. However if it does, your app will be more likely to interoperate with other apps.
The replication pattern described here:
Following the standard replication pattern, the ATS and job board apps in the example above would work together as below:
Below is a detailed message sequence diagram showing the API flows between replication primary and secondary and TAS.
In this example, the replication primary is an ATS holding the master set of candidates, and the replication secondary is a new candidate search app called "ferret".
This example also shows the use of the repstate app, which acts as a persistent store for the secondary's state during the bulk load phase.
As seen above, for a given types of master data (e.g. jobs), replication requires the following APIs:
Where possible, when indicating errors, API producers should document and use existing, meaningful http error codes.
For example, 409 Conflict could be returned if a consumer tried to create object "foo" but such object already exists.
If there is no suitable code, APIs should just use 400 or 500 as appropriate - we don't currently define new http status codes.
The RAML for the API should document the superset of http status codes that an API may produce - no app producing the API should return any other statuses.
In addition to the status code, APIs responding with errors (other than self-explanatory ones such as 404) should also return a body of type application/problem+json as per Problem Details for HTTP APIs.
The application/problem+json format uses the "type" field (a URI) to identify the type of error.
For predictable error cases (e.g. create a job application fails because the job is closed), the API documentation should specify actual values for type. At TAS, we start our types with http://constants.talentappstore.com/httpProblems/. For example:
/applications:
post:
responses:
200:
500:
body:
application/problem+json:
schema: !include ../schemas/applicationProblem.json
description: |
The app producing the API should return one of the following values in the type field where appropriate:
- http://constants.talentappstore.com/httpProblems/jobClosed - the job is closed
- http://constants.talentappstore.com/httpproblems/tooLate - its after midnight
Apps may also throw errors with undocumented values for type (obviously the consumer won't be able to take any specific action in this case).
By convention in TAS's tenant APIs, resource representations are consistent (for a given media type).
That is, an API call like GET /positionOpenings (for example) returns the same data to all callers, regardless of the consuming app (or of the principal in the case of OAuth APIs).
This does not imply that all callers can always access the resource - in OAuth in particular, some principals might receive 403 errors depending on business rules. However if a caller can access the resource, they will see the same data as any other caller.
This convention helps with scaling by allowing caching *behind* the authorisation layers in a layered architecture, for example:
API consumer calls GET /positionOpenings/10334
-> (internet)
-> SSL offload
-> auth
-> business rules checking (e.g. can this principal access this resource?)
-> cache/ reverse proxy (resource representations are constant, hence can often be cached)
-> origin API server (if not in cache)
The exception that proves the rule is when an API ends in "/me". The responses of such API calls should not be cached.
Its useful for apps to keep tenant-specific (i.e. not base domain) resources under "/t". That way the app can apply blanket handling of the tazzy-tenant header and refuse any requests without it.
We try and follow something like this pattern for urls of complex resources.
Class - broad family for the resource, e.g. buttons Who - principal type who can view the resource + actual viewer, e.g., me, anonymous, byID/{id} Where - significant location, e.g. general, /jobs/{job} - possibly implied by who What - specific resource type, e.g. possibles, meta, omitted where obvious Which - further filtering (over location), e.g. byName/{name}, byApp e.g.: /items/toCandidate/me/jobs/{job}/itemMetas/byName/{item} Class - /items Who - /toCandidate/me Where - /jobs/{job} What - /itemMetas Which - /byName/{item}
This section has some conventions that most apps that use SSO should follow.
Some apps have pages that work both when the user is signed in, and when they are not.
For example a career site may have a job landing page like:
https://acme.careersiteapp.com/jobs/100
When the user is not signed in, the page displays an apply button.
When the user is signed in, the apply button changes to "You've already applied". The page also detects that they are an employee, and a "refer a friend" button appears.
A problem occurs when the user follows a link from one app, where they are signed in, to another app, where they are not signed in (but could be instantly thanks to SSO). For example:
One solution to the above would be for the career site to always ask everyone to log in before viewing the page, but that would be a barrier for non-signed in users, and cause SEO problems.
So to handle this we follow this convention:
Whenever an app emits a link which it believes is to another app, and there is a signed in user, it appends the "isSignedIn" hint to it.The hint indicates to the destination app that the user is most likely already signed in, so it is probably OK to ask them to authenticate since we have good reason to believe the process will be instant and invisible (thanks to SSO).
In the example above, when the apply app links the user back to the job on the career site via acme.careersiteapp.com/jobs/100, it appends the isSignedIn hint as follows:
https://acme.careersiteapp.com/jobs/100?isSignedIn=candidate
Apps that use SSSO should incorporates a isSignedIn filter, which behaves as follows:
for all requests with the isSignedIn query parameter
if the parameter's value == the principal type for this app
// the user is likely signed in
force the user to authenticate before visiting the page (with the isSignedIn parameter stripped off)
/t/secure/redirect?redirectPage={uri}
NOT KNOWN THAT THIS IS NEEDED OR DESIRABLE. A redirect (http 302) hidden behind SSO. users arriving here must log in, then are immediately redirected to redirectPage. The redirect uri must start with "/", i.e. not be absolute - this prevents an attack where someone puts their own dodgy "enter your credit card details" page behind your app's login to make it look legit. Apps that allow user generated content (e.g. allow someone to host their dodgy pages on the app) should probably not have this resource.
SSO-enabled apps might want to use some of the following patterns for their pages:
Login required? | Implementation details |
---|---|
Never | White-listed in the proxy's SSO settings |
Always | Not white-listed in the proxy's SSO settings |
loggedIn aware | All pages are typically subject to the loggedInFilter described above |
Conditional | e.g. job details page requires login if the job is only visible to internals, so we can see if candidate is internal:
|
Conditional, but fail if not logged in | Resources that may require login but don't redirect to SSO, e.g. small buttons intended for rendering inside
iframes that don't provide enough room for the IdP disco panel and just display as gray if candidate not logged in.
|
Embedded login | Resources that render with or without login, but embed the login panel (IdP discovery service) if not logged in.
e.g. a job application form that displays but with fields grayed out if candidate is not logged in.
|
Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash).
Use the same "X-Request-ID" request header used by Heroku, RoR and others.
This needs work! Ignore for now.
Using /{ct}/{ca} (consumer) and /{pt}/{pa} (producer) segments within a tenant API's URI template is a signal to any app producing or consuming the API that to be valid, API calls should have the (consumer) == the actual consumer, and (producer) == the actual producer.
This means a tenant can install an app like a job board that consumes (say) the API /jobs/byConsumer/{ct}/{ca}, confident that (as long as either (a) the consumer app obeys the convention, and/or (b) the producer enforces it) the job board will only ever access its own postings, and not the posting for some other job board.
Given that the app "zambo", from the developer "zamsoft":
- consumes /jobs/{ct}/{ca}
- produces /jobs/{pt}/{pa}/deltaPings
...the convention means that, when installed by the tenant acme, zambo (and only zambo):
- can consume GET /jobs/acme/zamsoft/zambo
- can produce POST /jobs/acme/zamsoft/zambo/deltaPings
.. and that zambo:
- can not consume GET /jobs/acme/zamsoft/fruitbat
- will never produce POST /jobs/acme/zamsoft/fruitbat/deltaPings
In another example, the repstate app provides a centralised store where apps that act as replication secondaries can store the position of where they are up to in the bulk load phase of replication. Other apps can look in and summarise the readiness of all of the replication secondaries the tenant has installed.
Assuming the replicator cursor app:
- produces /replicationStatuses/{ct}/{cad}/{ca}/{apiDev}/{apiURI}
...then, when installed by the tenant acme, zambo (and only zambo):
- can consume POST /replicationStatuses/acme/zamsoft/zambo/tas/requisitions
- can consume GET /replicationStatuses/acme/zamsoft/zambo/tas/requisitions
.. and zambo:
- can not consume POST /replicationStatuses/acme/zamsoft/fruitbat/tas/requisitions
- can not consume GET /replicationStatuses/acme/zamsoft/fruitbat/tas/requisitions
NOTE: this approach causes tighter coupling since the producer is changing its behaviour depending on who the consumer is. That implies the tenant can't just rip and replace the consumer (just as they can't once the consumer contains its own state).
Tree semantics - jobs, locations -------------------------------- The TAS tenant APIs have the concept of categories - a hierarchical system for categorizing jobs and candidates, which allows: - searching for jobs that match candidates - actively (candidate searches via UI) - passively (candidate gets email when new jobs are posted that match their own profile) - searching for candidates - that match jobs - by the tenant, when searching candidates - by the tenant to create a search agent for candidates, possibly immediately after searching For some categories, jobs are restricted to only having a single leaf node (category value). General principles ------------------ Each category is an ordered list of one or more trees. Within the trees, any node that has no children is a leaf. A node with children is a folder. A leaf becomes a folder as soon as a leaf is added to it. A folder becomes a leaf when the last leaf is removed from it - i.e. a folder can never be empty. There can be any number of trees, each starting from a single root node. The smallest possible category is a single leaf (i.e. a single tree with a root node having no child nodes). A selection is a combined set of explicit folder selections and leaf selections. If a selection contains an explicit folder selection, that implicitly selects all of that folder's leaves and sub-folders. Examples --------- Job --- Store jobs Support office jobs IT Marketing Location -------- Store locations Asia .. EMEA Support office locations Australia .. Work type --------- Store work types Part-time Support office work types Remuneration ------------ Store roles Support office roles Hourly rate Salary 20K - 30K .. Example tree ------------ The following tree is used in the examples below: /a /b /d /e /c /f /g /h /i Normalizing selections ---------------------- A set of category values should be "normalized", which specifically means that: - any folder or leaf must be selected (explicitly or implicitly) only once - i.e., it is invalid to select both a node and one its ancestor nodes. - the minimum possible number of folder and leaf selections must be used - i.e., it is invalid to explicitly select all of the leaves beneath a folder without any differing details - the folder itself should be selected instead. The following are correctly normalized: /a /b,/f /a,/h The following are not: /b,/c (since /a would be more minimal) /a,/b (since /a implies /b) /d,/e (since /b would be more minimal) /f (since /c would be more minimal) A selection that was valid can become invalid due to changes in the master data. For example: /d becomes invalid if the master data changes to: /a /b /d /c /f (Since /b would be more minimal). Unprofiled subjects and searching --------------------------------- When searching, it is desirable, for any given category, to match subjects (i.e. candidates and jobs) that have no values for that category. This allows the tenant to add new factors over time (e.g. work type), without instantly filtering out all existing candidates and jobs from searches (at least until they have profiled themselves against the new category), and it also allows candidates to have "no selection" for factors they don't care about (e.g. work type). To achieve this, searching follows a brute force rule that if a subject has no values for a category, then at the point of being filtered, they are given temporary selections of the root node of every tree in the category. For example, if Fred has no values for the example category, then instead Fred is treated as if he has: /a,/g There are shortcomings with this, e.g. a candidate who is profiled as Job == Gardening Center and Location == East Brunswick Support Office could be said to have no selection for location, since there are no gardening jobs at the support offices. But there are no simple solutions so this is the best we can do. Merging a profile onto an existing multi-valued subject (candidate, recruiter saved search) -------------------------------------------- Where the candidate has no profile: The profile is applied, even though it effectively shrinks the candidate's search presence Aggregating specificity across a number of root nodes ----------------------------------------------------- The individual specificities are all calculated, then the maximum is used. * and then across numerous factors * General searching/matching -------------------------- Factors are AND-ed. Selections within individual factors are OR-ed. Searching with an entire factor omitted --------------------------------------- Generally, search subjects that have an empty selection are treated as if they have just the root node selected, which means that they will match any search. In more complex organizations, trees cannot be treated this simply, since things like location behave as a distinct set of subtrees. For example, only certain locations are relevant for the Position Type "Assistant Gardening Team Supervisor". In this case, the empty selection refers just to part of the tree being empty. ISSUE - should an artificial "ALL LOCATIONS" node be injected into the distilled location tree in a search UI when searching for "Assistant Gardening Team Supervisor" candidates? If so, how is selection of that artificial node represented: - on the wire, in parms to the "search candidates" API? - on the wire, in parms to the "profile candidate" API? If the answer to the above is "as an empty location selection", then are we moving towards allowing multiple root nodes? Since there is no need/use in having a single root node just to act as a "select all" actor? Specificity ----------- Specificity, a measure of how specific a selection is, where a lower number means the selection is more specific, is simply the count of explicitly or implicitly selected leaves. For example, a candidate who has selected a single job has a specificity of 1, whereas a candidate who has selected the root node, thus explicitly selecting all 213 leaf nodes, has a specificity of 213. Specificity is useful in ranking search results - for example a candidate who has selected only "butchery roles" is a more likely fit than, and should appear in search results before, someone who has "all megacorp roles". Hiding nodes ------------ Sometimes, a hiding selection can be applied to a tree. One example is hiding old, disused business groups that need to remain in the tree but should not be selected for new positions. A hiding selection must be normalized as per the same rules as a selection - e.g., hiding both a node and one of its ancestors is invalid. Applying a hiding selection to a tree gives the result set. By "applying", we mean: - for each hidden node - remove all direct and indirect descendant nodes - remove the hidden node itself - moving up through the ancestors, remove any ancestor that no longer has any child nodes The last step above ensures that all nodes in the result tree are of the same type (leaf or folder) as they were before hiding. Without this step, hiding the only leaf in a folder would result in the folder transforming to a leaf. ISSUE: WHAT ABOUT stripping out ancestor folders that have only one child after hiding?