Orbis Plus - Nota Bene

ORBIS+

Orbis+ is an add-on module to the Nota Bene Workstation

Orbis+, the new add-on module for Nota Bene, now lets you to search PDF, DOCX, DOC, RTF, and HTML files. The version of Orbis that has always been included in the Nota Bene Workstation — that lets you search NB or TXT files — is still included in Nota Bene. But now, with Orbis+, you can expand the range of searchable files to include PDF, DOCX, DOC, RTF, and HTML files. Now, more than ever — whether running on Windows, or on the Mac+Wine — the new Orbis+ is the “Idea Discovery Tool” that can qualitatively change the way you work.

“Just a word of thanks for getting us here; I am starting to see what your vision about two years ago was when you posted a message/visual demonstration of your ideas on the future use of Orbis as integrated depository of ALL valuable files; I am getting more confident on what lies ahead.”

Dr. Assie Gildenhuys, Department of Psychology, University of Pretoria, South Africa

CREATING A TEXTBASE IN ORBIS+

To create a textbase of files in different formats, select File, New Textbase, and give it a name. Then, with either Advanced Automatc Textbase or Predefined Orbis Formats selected, click Go to display the Add/Remove Files dialog.

Folders & Subfolders

A textbase can include files in a single folder, or in a thousand (or more)
- The folders can be on a single drive. or on different drives
When determining which folder(s) to use, you can select:
- A specific folder that contains all the files you want to search
- A top-level “parent” folder (with or without files) that contains subfolders that have files (and further subfolders with additional files) that you want to search, all the way down the folder tree

Once a folder is selected, there are two ways to select the files:

1. Include All Files, and Let Orbis Determine the Type

Select all files, and let Orbis+ determine the file type, based on its built-in file-detection system

Click the icon to apply only to the current folder
Click the + beside the icon to apply to the current folder AND all subfolders (all the way down the subfolder chain)

If you select the all-files option (this is equivalent to *.*), Orbis reads all files, detects the format of each, and then converts/extracts text from recognized types, using the appropriate text-extraction tool (either provided by NB, by Windows/Wine, or some external program)
You can place some restrictions on this process, by indicating file extensions that Orbis should simply ignore, without trying to detect the format of those files

A default list is provided, but you can edit this using File, Excluded List

You can even exclude more particular kinds of files for even finer control
- For example, “notes*.nb,” or all PDF files that start with the letter “a” (a*.pdf)
Note: to exclude particular types, click the yellow rotating arrow on the file-type toolbar to show the original file-type options
Any remaining problem files (which will simply be detected as unrecognized during textbase creation) can easily be excluded in subsequent updates
- These are now easily identifiable because of the new properties that are now available (see “Enhanced File Properties” below)

NOTE: All files in the current folder that match the selected type (that is, all files, except those excluded) will be automatically highlighted/selected (with a yellow/gold background) in the File Pane (the bottom right quadrant of this Add/Remove files dialog), with the filename in green:

2. Include Only Specific Files

A. Select File Type(s)

Tell Orbis which type[s] of files you want to include, based on the standard file extensions recognized by Windows

Click the icon to apply only to the current folder
Click the + beside the icon to apply to the current folder AND all subfolders (all the way down the subfolder chain)

Orbis reads only the files with the specified Windows type, as based on the file extension, and converts/extracts text from those files
You can of course include multiple file types, even in the same folder
You can also choose more specific file types, in addition, or instead of, the standard types, to include only certain classes of files:
- dissertation*.nb
- notes*.nb
- NYTimes*.html
- Manual??File.pdf
Note: to include particular types, click the yellow rotating arrow on the file-type toolbar to show the original file-type options
Similarly, you can, after including all files of a particular type, exclude certain groups of those files
- For example, “notes*.nb,” or all PDF files that start with the letter “a” (a*.pdf),
Note: to exclude particular types, click the yellow rotating arrow on the file-type toolbar to show the original file-type options

NOTE: All files in the current folder that match the selected type will be highlighted/selected in the File Pane (the bottom right quadrant of this Add/Remove files dialog), with the filename in green:

B. Select Files Individually

You can also select files individually in the Qualifying/Selected Files in Folder Pane:

NOTE: All files selected individually will be highlighted/selected in the File Pane, with the filename in blue:

C. By Type and Individually

The two file-selection modes can be combined:

If you designate file types, the files will be automatically selected (shown in green)
You can then also select individual files (these will be shown in blue)

What’s the Best Option?

Another Word About Folders

The way files are selected can vary from folder to folder:

You can select all files in folder X, specific file types in folder Y, and individual files in folder Z
You can apply a file type to all subfolders in folder A, and (assuming it is not a subfolder of A) only to the current folder in folder B
You can — even in one folder — have some file types apply to only the current folder, and other file types apply to that folder, and all subfolders

In short, you can make the decision about whether you choose all files, or specific files (whether by selecting them by type, or individually), on a folder-by-folder basis, depending on your needs:

Assessing the options:

Selecting specific file types works well in an orderly world, where every program has only a single file extension associated with it
But in a less orderly world, the first option provides extra reach for Orbis
- We have seen PDF files with .DAT extensions, to take but one such oddity, and these would be included using the first option, but missed in the second
- But more importantly, NB has historically resisted such file-extension neatness, by allowing virtually any file extension, and the first option simplifies inclusion of Nota Bene format files
The new NB+ option described above is a version of the second option — it reads files based on file extensions, but this time with an expanded list of extensions recognized by Orbis (not by Windows)
- However, unlike the first option, it requires that you specify all extensions — if you miss any, that file will be skipped
- If a file with that extension turns out to be not an Nota Bene file, Orbis will still include it if it recognizes its format

SEARCHING AN ORBIS+ TEXTBASE

Once an Orbis+ textbase has been created, you can search it like any Orbis textbase. You can:

Perform searches using the traditional Boolean operators (AND, OR, XOR, and NOT)
Search for phrases (two or more words next to each other)
Use the * wildcard to find words that begin similarly, but have different endings
Use synonym searching (if you have set up a synonym list)

The only difference is that now matches will now be found in included PDF, DOCX, DOC, RTF, and HTML files, in addition to NB and text files

MAJOR CHANGES IN THE LATEST RELEASE

Orbis+ Now Works on the Mac

Orbis+ now works on the Mac. PDF, HTML, DOCX, DOC, and RTF files can now be:

Indexed
Searched
Retrieved
Opened for viewing/editing (with Ctrl+O)

In our tests to date, in these functions Orbis+ on Windows and Orbis+ on the Mac work identically. Please let us know of any issues you may encounter.

Simplified Textbase Creation/Modification

We have simplified the process of creating new textbases and adding or removing files from an existing one:

A context-sensitive help pane (dark green box below) is now displayed by default
- You can of course turn this option off
Single-click icons are now available for selecting file types (pink box below)
- You can switch between this new mode and the original one by clicking the yellow rotating arrow icon (red box below)
When selecting all files (*.*), file-type exceptions can now be added (light green box below)
- A default list is already provided, but you can edit this using File, Excluded List
  - Binary (unreadable) files, such as programs, graphics, zip files, etc.
  - Text (readable) files, that you do not normally want to have indexed (*.BAK, *.INI, Javascript files, etc.)
A new NB+ option lets you tell Orbis that files with extensions other than .NB should be treated as Nota Bene format files (purple box below)
- This list is user customizable (Files, Set NB File Types)
A full list of file properties related to textbase searchability (see Enhanced File Properties below) is now displayed
- For each file, in the File Pane (blue box below)
  - You can scroll this list (using the scrollbar at the bottom, or increase the size of the dialog to view everything at one time
- For the selected file, in the help panel on the left (gold box below)

The Add/Remove File Dialog can be expanded to show all the fileds simultaneously

The size is now remembered when Orbis is closed and then reopened

New HTML Text Extractor

We’ve now written our own HTML text extractor:

This should avoid all those unwanted popups that using a web browser to extract text caused some of you
We should now be able to extract text from all HTML files (even when the web browser previously failed)

New/Enhanced Popup Viewers

There are now two ways to view the full file with the matching text, fully formatted:

Ctrl+O — opens the file in the default program for such files for further editing
- The file is opened, with the cursor at the top of the file
- The words at the beginning of the matching entry are copied to the clipboard, and can be pasted into the default program’s find function to help you go to the proper location
- This works similarly under Windows and on the Mac
Alt+P — in the default configuration, opens the Popup Viewer (currently, View, Popup Viewer on the Mac)
- PDF files (Windows and Mac)
  - All matching terms in the file are now highlighted, and the file is opened to the page containing the match

HTML files
- All matching terms in the file are now highlighted, and the page is opened to the entry containing the matching (indicated by a red ==> arrow)
DOCX, DOC, and RTF files
- If Word is installed on your Windows system, the matching entry will be selected and highlighted

There are a wide variety of other viewer options under File, Configure Viewers, each with its own characteristics, but in most cases the default options above are the most powerful

Enhanced File Properties

To facilitate management of files in a textbase, a full list of file properties are now displayed:

In the File Pane of the Add/Remove Files dialog
In the indexer log file

Among the data shown are the following:

The file name
The time required to index the file
The file type as recognized by Windows (based on the file extension)
The file type as detected by Nota Bene
The “textishness” of a PDF file — a calculation of how much text (as opposed to images, which cannot be searched) in a PDF file
- Most PDF files created today contain text (for the non-image portions of those files), but older files, such as scanned in from books or older storage media, are often images
The status of the file, including whether it:
- Was converted/required text extraction
- Caused an error (because the file was unreadable in principle [for example, it was a program or an image, and was thus “binary”] or in fact (the file was in principle readable, but produced some error)
The filter or program used to extract text from the file
The date of the file
The size of the file
The number of words in the file
The number of entries in the file

Together, these two data-rich summaries should make it much easier to make sure that you are retrieving all the information that can be retrieved

You can easily see which PDF files contain searchable information
You can you easily see (and, if desired, remove) any files that are problematic/unsearchable

In addition to viewing information about each file, in both the indexer log and on the Add/Remove Files dialog there are options to:

Open the file in the default program used to open that file so you can view its contents as understood by the program that created it
View the binary, uninterpreted, text of the file, as it would appear if you called up the file in a text editor
View the text that is extracted from that file (if the file required text extraction)
Remove the file from the textbase (if it is not searchable, or its properties show that it does not merit inclusion in the textbase)

These file properties should give you full control over all files on your system, letting you determine the usefulness of searches on them. All told, they should make it easy for you to select all files (*.*) for inclusion in Orbis, and later exclude the problematic ones, thus making sure that Orbis searches everything that is useful.

TO ORDER ORBIS+

CURRENT USERS CLICK HERE TO LOG IN

NEW USERS CLICK HERE

We Value Your Privacy