1. |
What
is NEWGEN® OmniExtract? |
2. |
What
data extraction methodologies does NEWGEN® OmniExtract currently support? |
3. |
What
is the minimum hardware and software requirement for NEWGEN®
OmniExtract? |
4. |
Which
image formats does NEWGEN® OmniExtract support? |
5. |
What
is the effective speed of scanning the documents? |
6. |
Is
NEWGEN® OmniExtract compatible with ISIS and Twain compatible Scanners? |
7. |
Is
any special license required for ISIS Scanners? |
8. |
How
are documents separated in NEWGEN® OmniExtract? |
|
9.
|
How
does the application take care of the shifting of pages that
might occur at the time of scanning? |
10. |
How
is data organized? |
11. |
What
is the maximum number of pages that can be scanned in a batch? |
12. |
Which
operating systems does NEWGEN® OmniExtract support? |
|
13. |
Can
we submit documents scanned using some external scanning software
for extraction in NEWGEN® OmniExtract? |
|
14.
|
How
does the application recognize which engine has to be applied
for which region in the document? |
15. |
What
are the distinctive features of the NEWGEN® OmniExtract OMR application? |
16. |
What
is the recommended resolution for OMR? |
17. |
Which
Barcode Symbologies are supported? |
18. |
What
is ICR Technology? |
19. |
How
is the data extracted from NEWGEN® OmniExtract ICR system stored? |
|
20.
|
How
does the NEWGEN® OmniExtract System stand out as compared to other
applications available in the market today? |
|
21. |
How
does one ensure 100% accuracy of the data that finally enters
the clients database? |
22. |
What
is Form Removal? |
23. |
Why
does the NEWGEN® OmniExtract solution have various modes for verification? |
|
24.
|
What
are the benefits that one accrues through NEWGEN® OmniExtract System
as against Data Entry operations? |
25. |
What
are the dictionaries? |
|
26.
|
How
tedious are the form designing for the Application and what
special applications are needed to design the forms? |
|
27.
|
Which
third party tool is integrated with NEWGEN® OmniExtract for recognition
of hand-written/ machine-printed (ICR/OCR) data? |
|
1.
What is NEWGEN®
OmniExtract? |
NEWGEN®
OmniExtract
processes virtually any kind of form and captures all possible
kinds of information, viz. Hand-printed/ Handwritten Characters,
Optical Marks (Ticked, Crossed or Filled Ovals/ Checkboxes),
Barcode Symbologies, Machine-printed Characters and MICR Fonts. |
|
|
2.
What data extraction methodologies does NEWGEN ® OmniExtract currently
support? |
NEWGEN ® OmniExtract currently supports
the following automatic data extraction methodologies:
- ICR (Intelligent Character Recognition)
- OMR (Optical Mark Recognition)
- BCR (Barcode Recognition)
- MICR (Magnetic Ink Character Recognition)
- OCR (Optical Character Recognition)
|
|
|
3.
What is the minimum hardware and software requirement for
NEWGEN® OmniExtract? |
NEWGEN® OmniExtract requires the following:
NEWGEN®
OmniExtract
Server
- Intel® Pentium® IV-based
processor running at 1.2GHz or higher
- Microsoft Windows 2000 Advanced Server; 2000 Server or
NT Server
- 1 GB free hard-drive space for complete installation (200
MB required)
- 512 MB of RAM
NEWGEN®
OmniExtract Workstations
- Intel ® Pentium®
II-based processor running at 450MHz or higher
- Microsoft Windows 2000 Professional; NT 4.0 Workstation;
98; 95
- 400 MB free hard-drive space for complete installation
(200 MB required)
- 128 MB of RAM
|
| |
| Minimum
requirements for both server & client as follows |
| Server:
1 GB RAM |
| Client:
P III & above |
| |
|
| |
Top |
|
4.
Which image formats does NEWGEN® OmniExtract support? |
NEWGEN® OmniExtract
supports Tiff 6.0 file format with 1-bit, 4-bit, 8-bit and
24-bit pixel depth support in industry standard compression
schemes. |
|
5.
Which scanners does NEWGEN® OmniExtract support? |
NEWGEN ® OmniExtract
supports all Twain and ISIS compatible scanners.
(For complete list please contact us omniextract@newgen.co.in) |
|
6.
What is the effective speed of scanning the documents? |
The scanning
speed depends on the rated speed of the scanner, the configuration
of the machine, and regular maintenance of the scanner. |
|
|
7.
Is NEWGEN ® OmniExtract compatible with ISIS and Twain compatible Scanners? |
Yes, NEWGEN®
OmniExtract
is compatible with both ISIS and Twain compatible scanners. |
|
8.
Is any special license required for ISIS Scanners? |
The ISIS Scanners
require special license to be bought from the concerned authority.
These licenses are usually supplied along with the scanner. |
|
9.
How are documents separated in NEWGEN® OmniExtract? |
NEWGEN® OmniExtract
adopts these methodologies for the document separation:
- For the documents that have a fixed number of pages, the
user can specify the number of pages that would comprise
a document and the application would accordingly separate
the batch into documents, each having the specified fixed
number of pages.
- For the cases in which the number of pages present per
document varies from document to document in a batch, the
user can use the index sheet as separator or a blank page
as separator. The index sheet is a sheet that is inserted
in between two documents and has a barcode or OCR text printed
on it and is used to separate the documents. The application
can identify these features at the time of scanning itself
and separate the batch into documents.
- Barcode based separation
- Layout based advanced separation techniques are also available
|
|
10.
How does the application take care of the shifting of pages
that might occur at the time of scanning? |
There is
bound to be some difference between the scanned images and
the template that was defined. Technically these artifacts
are known as trapezoids, pincushions, barrels, etc. and have
the potential of misleading image content search for registration.
These problems are taken care of by exhaustively researched
algorithms on Image Analysis that have been developed in house
by engineers solely dedicated to this cause. (Some of the
work can be found in publications/ journals of international
repute). Therefore, NEWGEN® OmniExtract has a very powerful feature
of registering and recognizing natural as well as artificially
manifested object content of the images for very accurate
results and low rejection rates. |
|
|
11.
How is data organized? |
Data is organized in batches.
And each batch is associated with a Form, where a Form can
be a single page or multi page document. |
|
| |
| 12. What is
the maximum number of pages that can be scanned in a batch? |
There is
no limit on number of pages/ documents in a batch but for
ease of processing, it is always recommended to have small
batches. |
|
13.
Which operating systems does NEWGEN® OmniExtract support? |
NEWGEN® OmniExtract Server
- Microsoft Windows 2000 Advanced Server; 2000 Server or
NT Server
NEWGEN®
OmniExtract Workstations
- Microsoft Windows 2000 Professional; NT 4.0 Workstation;
98; 95
|
|
|
14.
Can we submit documents scanned using some external scanning
software for extraction in NEWGEN® OmniExtract? |
Documents
scanned using some external scanning software can be submitted
to extraction server through the NEWGEN® OmniExtract interface. |
|
15.
How does the application recognize which engine has
to be applied for which region in the document? |
This
product has a full-fledged Form Definition Module.
The user defines the form template containing the
information about the zone and the recognition action
to be performed on the zone and also attaches the
fields, which are to be populated with the extracted
data from the forms. |
|
|
16.
What are the distinctive features of the NEWGEN® OmniExtract
OMR application? |
Accuracy:
The in-house OMR engine allows multiple differentials
passes at varied settings over the same data zone
on the form and gives the most-conforming output,
thereby ensuring 100% accuracy.
Speed: Also it works very fast,
typically taking about less than 1 second time for
say PIII, 650MHz-128 MB workstation, for extracting
full-page data from a standard A4 sized form. |
|
|
| 17.
What is the recommended resolution for OMR? |
It
works at any practical resolution, but the best results
are achieved at 200 DPI resolution. |
|
|
18.
Which Barcode Symbologies are supported? |
Following symbologies
are supported for Barcodes:
- USS39/Code39
- UPC-A
- UPC-E
- EAN13
- EAN8
- Code 93
- Code 128
- CODABAR
- I2OF5
These Barcodes can be used for coding information or
as quality parameters. |
|
| 19.
What is ICR Technology? |
The
Intelligent Character Recognition technology is capable
of interpreting handwritten / handprint characters.
The NEWGEN® OmniExtract ICR System is designed to provide
automatic data extraction from handwritten forms that
have to be processed in large volumes on a regular
basis. Also 2nd optional engine is also
integrated for increased accuracy. |
|
|
| 20.
How is the data extracted from NEWGEN® OmniExtract ICR system
stored? |
The
data extracted from the NEWGEN® OmniExtract ICR System is
stored as MS-Access data files, and this can be exported
to any ODBC compliant database. This export can also
be automated using tool-agents/connectors. |
|
| |
21.
How does the NEWGEN® OmniExtract System stand out as compared
to other applications available in the market today? |
While
various data extraction tools exist in the market
today, the NEWGEN® OmniExtract System has an edge over all
others in that it's high-end technology and in built
intelligence virtually ensures 100% accurate results.
The NEWGEN® OmniExtract System is intelligent enough to recognize
incorrect/doubtful characters. Thus after data extraction
the user is automatically notified of all characters
that the system finds doubtful and the user need only
review these selected characters. The system incorporates
user friendly and ergonomically designed interfaces,
which allow multiple users to review and verify the
extracted data with the least amount of effort.
|
| |
Also
the advanced image pre-processing algorithms that
work on an image before it is fed as an input to the
recognition engine ensure much higher accuracy than
is attainable through the crude engine.
During the verification stage, the verifier
has the options of choosing to either verify the extracted
data character by character or at a field level. Thus
the user has sufficient flexibility to further increase
accuracy of results, the system includes various dictionaries
like those of names, addresses etc. The system if
unsure of a field value that it is extracting, can
compare it with a valid value from the dictionary
and return the accurate result. The user has the freedom
of associating self-defined dictionaries as well.
This makes the application unique in the sense that
it actually learns while in operation. Also various
custom defined rules/ validations can also be built
in the system alongwith the ease of integration with
Workflow solutions for e.g. OmniFlow. |
|
|
22.
How does one ensure 100% accuracy of the data that
finally enters the clients database? |
To
achieve 100% results, one has to take care at each
and every stage of processing of documents. Some of
the precautions that can be taken are as follows:
Filling stage
At the form filling stage itself, if some precautions
are taken, the final results would be excellent. These
are listed as follows:
- Fill in a way so that the characters don't touch
the lines or the boxes.
- The characters that are filled should not lie
outside the boxes.
- The forms should be filled in as neatly as possible.
Scanning Stage
- The
documents to be used for extraction should be scanned
as carefully as possible.
- The
scanner should be serviced periodically so that
no noise is introduced at the time of scanning.
- Prevent
unnecessary skew that might enter due to improper
feeder adjustments.
Verification stage
- The engine shows the doubtful characters through
the verification screen.
The verification stage is very critical to achieve 100%
results. The user can first start with the Character
mode for verification and move on to the field mode
to get finally 100% results. (Character Mode and field
mode of verification explained later). |
|
23.
What is Form Removal? |
Form
removal is the technique for removing the static part
of the form and preserving the dynamic or the variable
part. The static part consists of form frame, general
instructions and other graphical objects that are
present in all copies of the forms. The variable data
consists of data entered, which differs from form-to-form.
Form removal is an important preprocessing stage in
ICR/ OCR processing. |
|
|
|
24.
Why does the NEWGEN® OmniExtract solution have various modes
for verification? |
The
advantage of using the NEWGEN® OmniExtract ICR System lies
in its strong verification facilities. Three modes
of verification are provided, the character mode,
field mode and Page mode, which allow the users to
verify extracted data for inconsistencies in the easiest
manner possible, reducing the overall processing time
drastically. The NEWGEN® OmniExtract solution ensures 100% accurate results
through a strong Verification module. These verification
modes are such that no doubtful characters are left
unattended, thus, ensuring 100% accuracy.
- Character Verification Mode
- Field Verification Mode
- Page Verification Mode
|
|
|
25.
What are the benefits that one accrues through
NEWGEN® OmniExtract
System as against Data Entry operations? |
The
data entry operations are both time consuming and
erroneous. However carefully the data entry is done
there always remain some errors which have to be cross-checked
through the data verifier. This takes a lot of time
and also there remains an additional cost of maintaining
the paper documents in the same order as the data
entry had been done to minimize the effort for the
Data Verifier. In spite of all these precautionary
measures the data entry accuracy is only 90%.
The
NEWGEN® OmniExtract ICR Application maintains the image of
the documents so that there is no need for maintaining
the physical paper. Since a dedicated machine is being
used for data extraction, the first level of data
entry is being done by the machine itself, thus eliminating
the need for data entry operators. The process of
data extraction from these forms is much faster as
compared to the data entry being done through data
entry operators. With the strong verification and
exhaustive dictionaries, NEWGEN® OmniExtract ICR System Guarantees
100% accuracy. |
|
26.
What are the dictionaries? |
A
dictionary is a repository of words maintained by
the application. The main benefit of the dictionary
is that when the application comes across a word that
is already present in the dictionary, it prompts the
user to either replace it by default or suggests the
user with alternatives. The dictionary can be updated
through the application with the words extracted in
the executed batch of forms. The user can also attach
user-defined dictionaries to the application. So,
the application is like a baby, it learns while it
grows. |
|
|
|
27.
How tedious are the form designing for the Application
and what special applications are needed to design
the forms? |
Being
a totally hardware independent solution the user has
complete freedom to design forms, the scanned images
of which go as input to NEWGEN® OmniExtract. Forms can be
designed using specialized packages such as Adobe®
PageMaker® or simply use a word processing package
like MS Word, MS-PowerPoint or any other multilingual
package. While designing the form general guidelines,
such as, keeping the area to be filled properly spaced
and giving placeholders for each character separately
should be kept in mind. Clear instructions for properly
filling up the forms should be specifically printed
on the form to remind the person filling up the forms.
For increasing accuracy, registration marks may be
drawn on the form. It is good practice to draw registration
marks in parallel lines on the two vertical edges
of the form sheet. Once the form has been designed
and printed, a scanned copy of the same is taken to
define the template using the comprehensive form Definition
of Newgen® OmniExtract. |
|
|
28.
Which third party tool is integrated with NEWGEN® OmniExtract
for recognition of hand-written/ machine-printed (ICR/OCR)
data? |
 |
Newgen
licenses reRecognition's Kadmos Engine for recongition
of hand-written/machine-printed (ICR/OCR) data.
About re Recognition GmbH:
About re Recognition GmbH manufactures ICR/OCR
software for the digtital interpratation of
handwritten and machine printed characters.
The Kadmos OCR/ICR (handwriting) recognition
engine has multiple languages support and application
development interfaces are available for C/C++/VB/Delphi,
and Java. It also has isolated character (REC),
isolated line (REL), and paragraph (REP) recognition
modules. |
|
|
|