View this site in : 
     |
Home    |

Press Room   |

   Partners     |
Careers  |
Contact Us
SOLUTIONS FOR BUSINESS PROCESS & DOCUMENT MANAGEMENT

PRODUCTS
OmniFlow
OmniDocs
OmniScan
Distributed Capture Solution
OmniExtract
OmniReports
Smart Statements
Cheque Truncation System
OmniCompliance
OmniOMS (O2MS)
 
OMNIEXTRACT
  Product Overview
  Product Features
  Application Areas
  Benefits
  Faqs
   
   
 
NEWGEN® OmniExtract
                       FAQs
 
 

1.

What is NEWGEN® OmniExtract?

2.

What data extraction methodologies does NEWGEN® OmniExtract currently support?

3.

What is the minimum hardware and software requirement for NEWGEN® OmniExtract?

4.

Which image formats does NEWGEN® OmniExtract support?

5.

What is the effective speed of scanning the documents?

6.

Is NEWGEN® OmniExtract compatible with ISIS and Twain compatible Scanners?

7.

Is any special license required for ISIS Scanners?

8.

How are documents separated in NEWGEN® OmniExtract?

9.

How does the application take care of the shifting of pages that might occur at the time of scanning?

10.

How is data organized?

11.

What is the maximum number of pages that can be scanned in a batch?

12.

Which operating systems does NEWGEN® OmniExtract support?

13.

Can we submit documents scanned using some external scanning software for extraction in NEWGEN® OmniExtract?

14.

How does the application recognize which engine has to be applied for which region in the document?

15.

What are the distinctive features of the NEWGEN® OmniExtract OMR application?

16.

What is the recommended resolution for OMR?

17.

Which Barcode Symbologies are supported?

18.

What is ICR Technology?

19.

How is the data extracted from NEWGEN® OmniExtract ICR system stored?

20.

How does the NEWGEN® OmniExtract System stand out as compared to other applications available in the market today?

21.

How does one ensure 100% accuracy of the data that finally enters the clients database?

22.

What is Form Removal?

23.

Why does the NEWGEN® OmniExtract solution have various modes for verification?

24.

What are the benefits that one accrues through NEWGEN® OmniExtract System as against Data Entry operations?

25.

What are the dictionaries?

26.

How tedious are the form designing for the Application and what special applications are needed to design the forms?

27.

Which third party tool is integrated with NEWGEN® OmniExtract for recognition of hand-written/ machine-printed (ICR/OCR) data?

1. What is NEWGEN® OmniExtract?

NEWGEN® OmniExtract processes virtually any kind of form and captures all possible kinds of information, viz. Hand-printed/ Handwritten Characters, Optical Marks (Ticked, Crossed or Filled Ovals/ Checkboxes), Barcode Symbologies, Machine-printed Characters and MICR Fonts.

Top

2. What data extraction methodologies does NEWGEN ® OmniExtract currently support?

NEWGEN ® OmniExtract currently supports the following automatic data extraction methodologies:

  • ICR (Intelligent Character Recognition)
  • OMR (Optical Mark Recognition)
  • BCR (Barcode Recognition)
  • MICR (Magnetic Ink Character Recognition)
  • OCR (Optical Character Recognition)

Top

3. What is the minimum hardware and software requirement for NEWGEN® OmniExtract?

NEWGEN® OmniExtract requires the following: NEWGEN® OmniExtract Server
  • Intel® Pentium® IV-based processor running at 1.2GHz or higher
  • Microsoft Windows 2000 Advanced Server; 2000 Server or NT Server
  • 1 GB free hard-drive space for complete installation (200 MB required)
  • 512 MB of RAM
NEWGEN® OmniExtract Workstations
  • Intel ® Pentium® II-based processor running at 450MHz or higher
  • Microsoft Windows 2000 Professional; NT 4.0 Workstation; 98; 95
  • 400 MB free hard-drive space for complete installation (200 MB required)
  • 128 MB of RAM
 
Minimum requirements for both server & client as follows
Server: 1 GB RAM
Client: P III & above
   
  Top

4. Which image formats does NEWGEN® OmniExtract support?

NEWGEN® OmniExtract supports Tiff 6.0 file format with 1-bit, 4-bit, 8-bit and 24-bit pixel depth support in industry standard compression schemes.

Top

5. Which scanners does NEWGEN® OmniExtract support?

NEWGEN ® OmniExtract supports all Twain and ISIS compatible scanners.
(For complete list please contact us omniextract@newgen.co.in)

Top

6. What is the effective speed of scanning the documents?

The scanning speed depends on the rated speed of the scanner, the configuration of the machine, and regular maintenance of the scanner.

Top

7. Is NEWGEN ® OmniExtract compatible with ISIS and Twain compatible Scanners?

Yes, NEWGEN® OmniExtract is compatible with both ISIS and Twain compatible scanners.

Top

8. Is any special license required for ISIS Scanners?

The ISIS Scanners require special license to be bought from the concerned authority. These licenses are usually supplied along with the scanner.

Top

9. How are documents separated in NEWGEN® OmniExtract?

NEWGEN® OmniExtract adopts these methodologies for the document separation:

  • For the documents that have a fixed number of pages, the user can specify the number of pages that would comprise a document and the application would accordingly separate the batch into documents, each having the specified fixed number of pages.  
  • For the cases in which the number of pages present per document varies from document to document in a batch, the user can use the index sheet as separator or a blank page as separator. The index sheet is a sheet that is inserted in between two documents and has a barcode or OCR text printed on it and is used to separate the documents. The application can identify these features at the time of scanning itself and separate the batch into documents.
  • Barcode based separation
  • Layout based advanced separation techniques are also available

Top

10. How does the application take care of the shifting of pages that might occur at the time of scanning?

There is bound to be some difference between the scanned images and the template that was defined. Technically these artifacts are known as trapezoids, pincushions, barrels, etc. and have the potential of misleading image content search for registration. These problems are taken care of by exhaustively researched algorithms on Image Analysis that have been developed in house by engineers solely dedicated to this cause. (Some of the work can be found in publications/ journals of international repute). Therefore, NEWGEN® OmniExtract has a very powerful feature of registering and recognizing natural as well as artificially manifested object content of the images for very accurate results and low rejection rates.

Top

11. How is data organized?

Data is organized in batches. And each batch is associated with a Form, where a Form can be a single page or multi page document.

Top

 
12. What is the maximum number of pages that can be scanned in a batch?
There is no limit on number of pages/ documents in a batch but for ease of processing, it is always recommended to have small batches.

Top

13. Which operating systems does NEWGEN® OmniExtract support?

NEWGEN® OmniExtract Server

  • Microsoft Windows 2000 Advanced Server; 2000 Server or NT Server
NEWGEN® OmniExtract Workstations
  • Microsoft Windows 2000 Professional; NT 4.0 Workstation; 98; 95

Top

  

14. Can we submit documents scanned using some external scanning software for extraction in NEWGEN® OmniExtract?

Documents scanned using some external scanning software can be submitted to extraction server through the NEWGEN® OmniExtract interface.

Top

15. How does the application recognize which engine has to be applied for which region in the document?

This product has a full-fledged Form Definition Module. The user defines the form template containing the information about the zone and the recognition action to be performed on the zone and also attaches the fields, which are to be populated with the extracted data from the forms.

Top

16. What are the distinctive features of the NEWGEN® OmniExtract OMR application?

Accuracy: The in-house OMR engine allows multiple differentials passes at varied settings over the same data zone on the form and gives the most-conforming output, thereby ensuring 100% accuracy.
Speed: Also it works very fast, typically taking about less than 1 second time for say PIII, 650MHz-128 MB workstation, for extracting full-page data from a standard A4 sized form.

Top

17. What is the recommended resolution for OMR?

It works at any practical resolution, but the best results are achieved at 200 DPI resolution.

Top

18. Which Barcode Symbologies are supported?

Following symbologies are supported for Barcodes:  
  • USS39/Code39  
  • UPC-A
  • UPC-E
  • EAN13
  • EAN8
  • Code 93
  • Code 128
  • CODABAR  
  • I2OF5
These Barcodes can be used for coding information or as quality parameters.

Top

19. What is ICR Technology?
The Intelligent Character Recognition technology is capable of interpreting handwritten / handprint characters. The NEWGEN® OmniExtract ICR System is designed to provide automatic data extraction from handwritten forms that have to be processed in large volumes on a regular basis. Also 2nd optional engine is also integrated for increased accuracy.
 

Top

20. How is the data extracted from NEWGEN® OmniExtract ICR system stored?
The data extracted from the NEWGEN® OmniExtract ICR System is stored as MS-Access data files, and this can be exported to any ODBC compliant database. This export can also be automated using tool-agents/connectors.

Top

 

21. How does the NEWGEN® OmniExtract System stand out as compared to other applications available in the market today?

While various data extraction tools exist in the market today, the NEWGEN® OmniExtract System has an edge over all others in that it's high-end technology and in built intelligence virtually ensures 100% accurate results. The NEWGEN® OmniExtract System is intelligent enough to recognize incorrect/doubtful characters. Thus after data extraction the user is automatically notified of all characters that the system finds doubtful and the user need only review these selected characters. The system incorporates user friendly and ergonomically designed interfaces, which allow multiple users to review and verify the extracted data with the least amount of effort. 
 

Also the advanced image pre-processing algorithms that work on an image before it is fed as an input to the recognition engine ensure much higher accuracy than is attainable through the crude engine.  During the verification stage, the verifier has the options of choosing to either verify the extracted data character by character or at a field level. Thus the user has sufficient flexibility to further increase accuracy of results, the system includes various dictionaries like those of names, addresses etc. The system if unsure of a field value that it is extracting, can compare it with a valid value from the dictionary and return the accurate result. The user has the freedom of associating self-defined dictionaries as well. This makes the application unique in the sense that it actually learns while in operation. Also various custom defined rules/ validations can also be built in the system alongwith the ease of integration with Workflow solutions for e.g. OmniFlow.

Top

22. How does one ensure 100% accuracy of the data that finally enters the clients database?

To achieve 100% results, one has to take care at each and every stage of processing of documents. Some of the precautions that can be taken are as follows:

Filling stage

At the form filling stage itself, if some precautions are taken, the final results would be excellent. These are listed as follows:

  • Fill in a way so that the characters don't touch the lines or the boxes.
  • The characters that are filled should not lie outside the boxes.
  • The forms should be filled in as neatly as possible.
Scanning Stage
  • The documents to be used for extraction should be scanned as carefully as possible.
  • The scanner should be serviced periodically so that no noise is introduced at the time of scanning.
  • Prevent unnecessary skew that might enter due to improper feeder adjustments.
Verification stage
  • The engine shows the doubtful characters through the verification screen.
The verification stage is very critical to achieve 100% results. The user can first start with the Character mode for verification and move on to the field mode to get finally 100% results. (Character Mode and field mode of verification explained later).

Top

23. What is Form Removal?

Form removal is the technique for removing the static part of the form and preserving the dynamic or the variable part. The static part consists of form frame, general instructions and other graphical objects that are present in all copies of the forms. The variable data consists of data entered, which differs from form-to-form. Form removal is an important preprocessing stage in ICR/ OCR processing.

Top

24. Why does the NEWGEN® OmniExtract solution have various modes for verification?

The advantage of using the NEWGEN® OmniExtract ICR System lies in its strong verification facilities. Three modes of verification are provided, the character mode, field mode and Page mode, which allow the users to verify extracted data for inconsistencies in the easiest manner possible, reducing the overall processing time drastically.

The NEWGEN® OmniExtract solution ensures 100% accurate results through a strong Verification module. These verification modes are such that no doubtful characters are left unattended, thus, ensuring 100% accuracy.

  • Character Verification Mode
  • Field Verification Mode 
  • Page Verification Mode

Top

25. What are the benefits that one accrues through NEWGEN® OmniExtract System as against Data Entry operations?

The data entry operations are both time consuming and erroneous. However carefully the data entry is done there always remain some errors which have to be cross-checked through the data verifier. This takes a lot of time and also there remains an additional cost of maintaining the paper documents in the same order as the data entry had been done to minimize the effort for the Data Verifier. In spite of all these precautionary measures the data entry accuracy is only 90%. 

The NEWGEN® OmniExtract ICR Application maintains the image of the documents so that there is no need for maintaining the physical paper. Since a dedicated machine is being used for data extraction, the first level of data entry is being done by the machine itself, thus eliminating the need for data entry operators. The process of data extraction from these forms is much faster as compared to the data entry being done through data entry operators. With the strong verification and exhaustive dictionaries, NEWGEN® OmniExtract ICR System Guarantees 100% accuracy.

Top

26. What are the dictionaries?

A dictionary is a repository of words maintained by the application. The main benefit of the dictionary is that when the application comes across a word that is already present in the dictionary, it prompts the user to either replace it by default or suggests the user with alternatives. The dictionary can be updated through the application with the words extracted in the executed batch of forms. The user can also attach user-defined dictionaries to the application. So, the application is like a baby, it learns while it grows.

Top

27. How tedious are the form designing for the Application and what special applications are needed to design the forms?

Being a totally hardware independent solution the user has complete freedom to design forms, the scanned images of which go as input to NEWGEN® OmniExtract. Forms can be designed using specialized packages such as Adobe® PageMaker® or simply use a word processing package like MS Word, MS-PowerPoint or any other multilingual package. While designing the form general guidelines, such as, keeping the area to be filled properly spaced and giving placeholders for each character separately should be kept in mind. Clear instructions for properly filling up the forms should be specifically printed on the form to remind the person filling up the forms. For increasing accuracy, registration marks may be drawn on the form. It is good practice to draw registration marks in parallel lines on the two vertical edges of the form sheet. Once the form has been designed and printed, a scanned copy of the same is taken to define the template using the comprehensive form Definition of Newgen® OmniExtract.

Top

28. Which third party tool is integrated with NEWGEN® OmniExtract for recognition of hand-written/ machine-printed (ICR/OCR) data?

Newgen licenses reRecognition's Kadmos Engine for recongition of hand-written/machine-printed (ICR/OCR) data.

About re Recognition GmbH:
About re Recognition GmbH manufactures ICR/OCR software for the digtital interpratation of handwritten and machine printed characters. The Kadmos OCR/ICR (handwriting) recognition engine has multiple languages support and application development interfaces are available for C/C++/VB/Delphi, and Java. It also has isolated character (REC), isolated line (REL), and paragraph (REP) recognition modules.

Top

 
 
 
© 2007 Newgen Software Technologies Limited. All rights reserved.
Terms of Use    |     Privacy Policy    |    Webmaster
| Home | About Newgen | Products | Services | Industry Solutions | Events | Partners | Feedback | Case Studies | Downloads | Site Map |
Site Designed and Developed by Magnon Solutions Pvt. Ltd.