Skip to main content

Validator Types

Astrea currently has a number of validator types, these validators enable you to create rules to check the format and content of text from extracted files.

The currently identified validator types are:

  • Fixed
  • Regular Expression
  • Field (cross-reference another field)
  • Filename
  • Excel/CSV/JSON comparison
  • PDF/A
info

If you wish to suggest validator types, please reach out to us via the feature request form

Fixed

The fixed field validator takes an input from the user and matches the text exactly, for example:

Image

In this image the user has configured the field to check for "CONSTRUCTION" in the Status field validator, the extracted text is in "CONSTRUCTION" and the validation test passes.

Image

In this image, the user has configured the field to check for "CONSTRUCTION" in the Status field validator, however the extract text is "CONSTRUCTIOM" and the validation test fails.

This validator type is good for elements in a drawing that should never change.

Implemented

This feature is live in the test client.

Regular Expression

Regular expression ('regex') is a special sequence of characters that defines a specific search pattern within text. It is used for advanced 'find and replace' type operations, input validation and general text manipulation in various programming or text editors.

Regex patterns cab be very simple, like searching for a single letter or highly complex using special characters and rules to specify matching conditions.

Regex is an extremely powerful tool for text matching and validation, however it can be quite unwieldly, prior to use we will run through some basic examples of regular expression matching.

Implemented

This feature is live in the test client.

Example regular expressions

There are some specific meta sequences (regex specific characters) that are important to know when building a match or validation, these are:

ShorthandMeaningEquivalent
\dAny digit (0–9)[0-9]
\DNot a digit[^0-9]
\w"Word" character (letters, digits, underscore)[A-Za-z0-9_]
\WNot a "word" character[^A-Za-z0-9_]
\sWhitespace (space, tab, newline, carriage return, form feed, vertical tab)[ \t\n\r\f\v]
\SNot whitespace[^ \t\n\r\f\v]

Date matching

If we have a date we are trying to validate, such as 1991-05-29

You can write a regex that will match it by writing those exact characters: 1991-05-29

However, if we wish to match the pattern that the date has, which is: YYYY-MM-DD

We would express this as: \d\d\d\d-\d\d-\d\d

In this regular expression each of the \d is matching exactly one digit and each of the - is matching that exact character.

  • \d - Any digit character (0-9)
  • - - Exact character match

This means that any of these dates:

  • 1991-05-29
  • 3924-05-29
  • 1066-05-29

Matched the expressed pattern and would pass the validation.

Document Number matching

Regular expressions can also match a document number regardless of the format or variation your document numbering system has.

ISO 19650 Example

We have a document number with the following format: C123-MEP-B1-M3-DR-M-0450

Our regex pattern to match this number would be: \w\d\d\d-\w\w\w-\w\d-\w\d-\w\w-\w-\d\d\d\d

  • \w - A any word or character
  • \d - Any digit character (0-9)
  • - - Exact character match

Trafikverket example

We have a document number with the following format: 101T0311

Our regex pattern to match this numbber would be : \d\d\d\w\d\d\d\d

  • \w - A any word or character
  • \d - Any digit character (0-9)

Trafikverket example with whitespace

Perhaps the format has whitespace, in which case we would have: 1 01 T 03 11

Our regex pattern to match this would be: \d\s\d\d\s\w\s\d\d\s\d\d

  • \w - A any word or character
  • \d - Any digit character (0-9)
  • \s - Any whitespace character (space, tabs, linebreaks)

Filename

The filename validator checks an extracted field (typically the document number) against the filepath of the document being checked. The filename validator is a fixed check.

Passing example 1:

  • Document Number inside the file: C123-MEP-B1-M3-DR-M-0450
  • Document filepath: C123-MEP-B1-M3-DR-M-0450.pdf

The file extension .pdf is stripped from the path and the values are compared, as they match, this check will pass.

Failing example 1:

  • Document Number inside the file: C123-MEP-B1-M3-DR-M-0450
  • Document filepath: C123-MEP-B1-M3-DR-M-0451_01.pdf

The file extension .pdf is stripped from the path and the values are compared, due to the additional _01 in the filename, the check fails.

Any variation of the filename will cause the pass to fail, more examples of this are:

  • C123-MEP-B1-M3-DR-M-04514.pdf
  • C123-MEP-B1-M3-DR-M-0451-v7_final.pdf
  • C123-MEP-B1-M3-DR-M-0451(1).pdf
  • C123-MEP-B1-M3-DR-M-0451_01.pdf
Implemented

This feature is live in the test client.

PDF/A Validation

The PDF/A validator will check if the PDF metadata contains compliance metadata that confirms the underlying PDF/A implementation and it's level.

info

To learn more about PDF/A compliance, see our knowledge base: PDF/A Compliance

Currently there are various levels of PDF/A compliance:

PDF/A VariantPDF Version
PDF/A-1b1.4
PDF/A-1a1.4
PDF/A-2b1.7
PDF/A-2u1.7
PDF/A-2a1.7
PDF/A-3b1.7
PDF/A-3u1.7
PDF/A-3a1.7
PDF/A-4 (base)2
PDF/A-4f2
PDF/A-4e2

Simply select the variant of PDF/A or select PDF/A any in the UI to determine assess which standard the PDF matches.

Implemented

This feature is live in the test client.

Table import (Excel, CSV, JSON)

The table import validator will attempt to match the Validator Field name to the input file columns. When it finds the column, it will then check the values against the extracted text and pass/fail based on what is being checked.

This type of check enables cross-reference validations against your projects Electronic Document Management System (EDMS), Common Data Environment (CDE) or Document Register.

Pending Implementation

This feature is being implemented in the test client.