Inputs

Input sanitation

Data enrichment systems ingest user-provided data from sources like:

CRMs
ATSs
Web forms

User-provided data can contain mistakes or be invalid. Pipe0 has a robust sanitation layer to clean and regenerate data.

Cleanup

The following request payload contains common errors but will be processed successfully.

Request with common errors

{
  "pipes": [
    {
      "pipe_id": "company:identity@3"
    }
  ],
  "input": [
    {
      "id": 1,
      "name": "Susi Jui",
      "company_name": "Pipe0",
      "company_domain": "https://www.pipe0.com/about", // protocol, www and path get stripped
      "email": "mailto:susi@pipe0.com", // "mailto:" prefix
      "personal_website_url": "wwwww.susi.com" // wwww instead of www
    },
    {
      "id": 2,
      "name": "Tom Schmidt",
      "company_name": "Pipe0",
      "company_domain": "not today" // invalid: expected a domain
    }
  ]
}

Here's how we clean this request:

Parse URLs into a consistent format and clean common mistakes
Parse email addresses into a consistent format and clean common mistakes
Fix obvious typos and remove invalid characters
Convert data formats on demand (int to float, float to int, int to string)

Regeneration

In our example, "company_domain": "not today" is not a valid domain.

Because company_domain is an output field of company:identity@3, we can find the correct value.

During processing, company:identity@3 detects that company_domain is of invalid format and replaces it with the correct value.

The result may look like this:

Healed record

{
    "id": 2,
    "name": "Tom Schmidt",
    "company_name": "Pipe0",
    "company_domain": "pipe0.com" // healed
}

Valid input values are not regenerated. Instead, they are copied from the input to the record.

Incomplete data

It is common for input data to be incomplete.

Failing the entire task because one input object cannot be processed is impractical and annoying.

Partially missing input fields

If we find at least one input object that can be processed, pipeline validation will pass.

Take the following request payload:

One record can be processed

{
  "pipes": [
    {
      "pipe_id": "company:identity@3"
    }
  ],
  "input": [
    {
      "id": 1,
      "name": "Susi Jui",
      "company_name": "Pipe0"
    },
    { // CANNOT be processed by "company:identity@3"
      "id": 2
      // required `company_name` missing
    }
  ]
}

The pipe company:identity@3 requires the input field company_name which is not present in record id=2. In this case:

Pipeline validation passes.
Record id=1 is processed in full.
Record id=2 has failed fields.

No input object has the required input fields

Another example:

No record can be processed

{
  "pipes": [
    {
      "pipe_id": "company:identity@3"
    }
  ],
  "input": [
    { // CANNOT be processed by "company:identity@3"
      "id": 1,
      "name": "Susi Jui"
    },
    { // CANNOT be processed by "company:identity@3"
      "id": 2,
      "name": "Tom Schmidt"
    }
  ]
}

No input object has the required field company_name. The request will fail during request validation.

The entire task will fail before processing starts.

In practice, dealing with failing tasks can be annoying. If you don't want to deal with failing tasks, there's an escape hatch: If you define the expected input fields and set them to null, pipeline validation will pass. The task will not fail. Instead, only individual fields fail.

Input expansion

Input expansion is an advanced concept that you can safely ignore if you don't plan to use pipe0 for complex UIs.

When you enrich data with pipe0 you transform your "input objects" into output records. An input object may look like this:

{
    "id": 2,
    "name": "Tom Schmidt"
}

Some interactions require you to reprocess previously processed fields. For this, it is common to transform your output records back to input objects. By doing so, previous processing information is lost. This includes metadata like the result of a waterfall, UI widgets, etc.

If you pass a plain value to the API, it will always be marked as resolved_by:input.

Instead of passing your input as a plain value, there is another way: Input expansion.

You can pass your inputs fully or partially expanded (as the field value of the response object).

Expanded input field

{
  "id": 2,
  "name": {
      "value": "Tom Schmidt",
      "status": "completed",
      "type": "string",
      "reason": null,
      "meta": null,
      "ui": {
        "severity": "none"
      }
  }
}

Expanding inputs gives control but shifts the responsibility of providing valid input states to you.