CHM Scanner

Overview. CHM Scanner allows the user to localize CHM files (Microsoft Compiled HTML Help files).

The most important settings are the following:

  • CHM File (Project tab); the CHM file to localize
  • Tools (Project tab); the location of the Microsoft Compiled HTML Help applications

Availability

This plug-in is available as a separate plug-in. To get an evaluation version of this plug-in, contact sales(at)multilizer.com.


Tabs

Project

CHM File. Specifies the location of the CHM file.

Tools. Specifies the locations of the Microsoft Compiled HTML Help applications. Multilizer uses two utility applications to scan and build CHM files: hhc.exe and hh.exe. For more information, see the Microsoft documentation on the Microsoft HTML Help SDK or download the latest version of Microsoft HTML Help Workshop and Documentation.

Multilizer tries to detect the location of hhc.exe and hh.exe automatically. If the detction fails, the user can set the location manually.

Encodings

Native language. Specifies the native language that is used in the target. Set this to match the language in the original user interface. If the original material contains two or more languages mixed, set the language to most widely used.

Native encoding. Specifies the native encoding that is used in the target with each character set.

Encoding list. Contains the encodings to be used in the localized files. To change the value right click the line and select a new value.

Smart Parse

Smart parse settings affect how formatting tags and entities are processed.

Enable smart parse to prevent splitting of sentences at specific tags. By default typical formatting tags, such as, b, i and u are in the list.

Smart parse settings also allow converting ISO 8859-1 character and symbol entities into characters, thus making the text more readable. If converting ISO 8859-1 character entities is selected then, for example, HTML character name Ç is converted into single character Ç. Correspondingly, if converting ISO 8859-1 symbol entities is selected then, for example, HTML symbol name © is converted into single character ©.

Segmentation (SRX)

CHM Scanner incorporates a text segmentation engine. Users can define rules by which text is segmented.

The main advantage of segmentation is obtaining optimum results from a translation memory. Typically, the shorter the segment the more likely a match is found in a translation memory. For instance, a translation memory more likely has a translation for a single sentence than for an entire paragraph comprising several sentences. In order to optimize the use of translation memories, the same segmentation rules should be used in all translation tools; hence, a standard format for storing segmentation rule is very convenient. Using segmentation is particularly useful when working with TMX files as SRX was developed as a companion standard to TMX.

Defining segmentation rules is not trivial. In order to faciliate the process, a test environment is provided the Segmentation tab with which the effects of a set of rules can tested and optimized. Click the Test button to open the test environment, select the desired language and click the Run button to see how the sample text is segmented using the current rules for the selected language.

The segmentation rules can be exported and imported in SRX format, the standard for describing how text is broken into segments for further processing. For more information about the SRX standard, including examples, see the documentation at the LISA web site.