Validator Types
Astrea currently has a number of validator types, these validators enable you to create rules to check the format and content of text from extracted files.
The currently identified validator types are:
- Fixed
- Regular Expression
- Field (cross-reference another field)
- Filename
- Excel/CSV/JSON comparison
- PDF/A
If you wish to suggest validator types, please reach out to us via the feature request form
Fixed
The fixed field validator takes an input from the user and matches the text exactly, for example:
Image
In this image the user has configured the field to check for "CONSTRUCTION" in the Status field validator, the extracted text is in "CONSTRUCTION" and the validation test passes.
Image
In this image, the user has configured the field to check for "CONSTRUCTION" in the Status field validator, however the extract text is "CONSTRUCTIOM" and the validation test fails.
This validator type is good for elements in a drawing that should never change.
This feature is live in the test client.
Regular Expression
Regular expression ('regex') is a special sequence of characters that defines a specific search pattern within text. It is used for advanced 'find and replace' type operations, input validation and general text manipulation in various programming or text editors.
Regex patterns cab be very simple, like searching for a single letter or highly complex using special characters and rules to specify matching conditions.
Regex is an extremely powerful tool for text matching and validation, however it can be quite unwieldly, prior to use we will run through some basic examples of regular expression matching.
This feature is live in the test client.
Example regular expressions
There are some specific meta sequences (regex specific characters) that are important to know when building a match or validation, these are:
| Shorthand | Meaning | Equivalent |
|---|---|---|
\d | Any digit (0–9) | [0-9] |
\D | Not a digit | [^0-9] |
\w | "Word" character (letters, digits, underscore) | [A-Za-z0-9_] |
\W | Not a "word" character | [^A-Za-z0-9_] |
\s | Whitespace (space, tab, newline, carriage return, form feed, vertical tab) | [ \t\n\r\f\v] |
\S | Not whitespace | [^ \t\n\r\f\v] |
Date matching
If we have a date we are trying to validate, such as 1991-05-29
You can write a regex that will match it by writing those exact characters: 1991-05-29
However, if we wish to match the pattern that the date has, which is: YYYY-MM-DD
We would express this as: \d\d\d\d-\d\d-\d\d
In this regular expression each of the \d is matching exactly one digit and each of the - is matching that exact character.
\d- Any digit character (0-9)-- Exact character match
This means that any of these dates:
1991-05-293924-05-291066-05-29
Matched the expressed pattern and would pass the validation.
Document Number matching
Regular expressions can also match a document number regardless of the format or variation your document numbering system has.
ISO 19650 Example
We have a document number with the following format: C123-MEP-B1-M3-DR-M-0450
Our regex pattern to match this number would be: \w\d\d\d-\w\w\w-\w\d-\w\d-\w\w-\w-\d\d\d\d
\w- A any word or character\d- Any digit character (0-9)-- Exact character match
Trafikverket example
We have a document number with the following format: 101T0311
Our regex pattern to match this numbber would be : \d\d\d\w\d\d\d\d
\w- A any word or character\d- Any digit character (0-9)
Trafikverket example with whitespace
Perhaps the format has whitespace, in which case we would have: 1 01 T 03 11
Our regex pattern to match this would be: \d\s\d\d\s\w\s\d\d\s\d\d
\w- A any word or character\d- Any digit character (0-9)\s- Any whitespace character (space, tabs, linebreaks)
Filename
The filename validator checks an extracted field (typically the document number) against the filepath of the document being checked. The filename validator is a fixed check.
Passing example 1:
- Document Number inside the file:
C123-MEP-B1-M3-DR-M-0450 - Document filepath:
C123-MEP-B1-M3-DR-M-0450.pdf
The file extension .pdf is stripped from the path and the values are compared, as they match, this check will pass.
Failing example 1:
- Document Number inside the file:
C123-MEP-B1-M3-DR-M-0450 - Document filepath:
C123-MEP-B1-M3-DR-M-0451_01.pdf
The file extension .pdf is stripped from the path and the values are compared, due to the additional _01 in the filename, the check fails.
Any variation of the filename will cause the pass to fail, more examples of this are:
C123-MEP-B1-M3-DR-M-04514.pdfC123-MEP-B1-M3-DR-M-0451-v7_final.pdfC123-MEP-B1-M3-DR-M-0451(1).pdfC123-MEP-B1-M3-DR-M-0451_01.pdf
This feature is live in the test client.
PDF/A Validation
The PDF/A validator will check if the PDF metadata contains compliance metadata that confirms the underlying PDF/A implementation and it's level.
To learn more about PDF/A compliance, see our knowledge base: PDF/A Compliance
Currently there are various levels of PDF/A compliance:
| PDF/A Variant | PDF Version |
|---|---|
| PDF/A-1b | 1.4 |
| PDF/A-1a | 1.4 |
| PDF/A-2b | 1.7 |
| PDF/A-2u | 1.7 |
| PDF/A-2a | 1.7 |
| PDF/A-3b | 1.7 |
| PDF/A-3u | 1.7 |
| PDF/A-3a | 1.7 |
| PDF/A-4 (base) | 2 |
| PDF/A-4f | 2 |
| PDF/A-4e | 2 |
Simply select the variant of PDF/A or select PDF/A any in the UI to determine assess which standard the PDF matches.
This feature is live in the test client.
Table import (Excel, CSV, JSON)
The table import validator will attempt to match the Validator Field name to the input file columns. When it finds the column, it will then check the values against the extracted text and pass/fail based on what is being checked.
This type of check enables cross-reference validations against your projects Electronic Document Management System (EDMS), Common Data Environment (CDE) or Document Register.
This feature is being implemented in the test client.