This document describes the serialization of the lexical markup framework (LMF) model defined as an extensible markup language (XML) model derived from the language base exchange (LBX) schema and compliant with the W3C XML schema. This serialization covers the classes, data categories, and mechanisms of ISO 24613-1 (core model), ISO 24613-2 (machine-readable dictionary (MRD) model), and ISO 24613-3 (etymological extension).

  • Standard
    32 pages
    English language
    sale 15% off
  • Standard
    33 pages
    French language
    sale 15% off
  • Draft
    32 pages
    English language
    sale 15% off
  • Draft
    36 pages
    English language
    sale 10% off
    e-Library read for
    1 day

This document specifies the structure of an ontology for a fine-grained description of the expressive power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a taxonomy of CQLs. This document specifies: a) the taxonomy of the CQLF metamodel; b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”); c) the structure of the layers of the expressive power taxonomy and the relationships between them, in the form of subsumption assertions; d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in the form of positive and negative conformance statements. This document does not define the entire contents of the ontology (see Clause 4).

  • Standard
    23 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    18 pages
    English language
    sale 15% off
  • Draft
    24 pages
    English language
    sale 10% off
    e-Library read for
    1 day

This document covers the measurable or magnitudinal aspect of quantity so that it can focus on the technical or practical use of measurements in IR (information retrieval), QA (question answering), TS (text summarization), and other NLP (natural language processing) applications. It is applicable to the domains of technology that carry more applicational relevance than some theoretical issues found in the ordinary use of language. NOTE ISO 24617-12 deals with more general and theoretical issues of quantification and quantitative information. This document also treats temporal durations that are discussed in ISO 24617-1, and spatial measures such as distances that are treated ISO 24617-7, while making them interoperable with other measure types. It also accommodates the treatment of measures or amounts that are introduced in ISO 24617‑6:2016, 8.3.

  • Standard
    26 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    21 pages
    English language
    sale 15% off
  • Standard
    21 pages
    French language
    sale 15% off
  • Draft
    21 pages
    English language
    sale 15% off
  • Draft
    29 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Draft
    22 pages
    French language
    sale 15% off

This document describes an extension to ISO 24613-1 and ISO 24613-2 to support the development of detailed descriptions of common etymological phenomena and/or diachronic information with respect to lexical entries in born-digital and/or retro-digitized lexicons. It provides both a meta-model for such an extension as well as the relevant data categories.

  • Standard
    26 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    22 pages
    English language
    sale 15% off
  • Standard
    22 pages
    French language
    sale 15% off
  • Draft
    22 pages
    English language
    sale 15% off
  • Draft
    26 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Draft
    26 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Draft
    22 pages
    French language
    sale 15% off

This document provides basic principles and a methodology for establishing a specification for designing and constructing a formally defined, or controlled, system of oral communication that avoids or filters out phonetic interferences and confusions between words of the same language and between languages. The system is both abstracted from, and contextually situated in, the domains of industry, business or other technologies. This document deals only with oral communication between native speakers, or non-native speakers, or a native speaker and a non‑native speaker, who can be disturbed due to different phenomena, such as phoneme confusion, phonetic interferences and confusions between words (for example: homophony, quasi-homophony or co-articulation) of the same language and/or different languages and the resulting ambiguities due, for example, to multilingual communication or stressful situations. This document deals with speakers and listeners without speech or hearing impediments[16], and does not include sign languages which have a phonological system equivalent to the system of sounds in spoken languages[23]. Foreseen applications are essentially in safety critical applications using human oral communication. This document is also applicable to other domains involving, for example, training and evaluation procedures and robots.

  • Standard
    30 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    24 pages
    English language
    sale 15% off
  • Standard
    26 pages
    French language
    sale 15% off
  • Draft
    23 pages
    English language
    sale 15% off
  • Draft
    23 pages
    French language
    sale 15% off

This document specifies how to represent (not visualize) documents (instance data, not data schemas) as graphs. It does not specify how to visualize or operate on document data, but it aims at making documents easier for people to compose and comprehend by allowing for various graph-based flexible user interfaces, possibly incorporating document-visualization practices (see Introduction). In this connection, this document does not specify annotations to existing documents either, but rather it specifies a schema of documents with explicit logical structures.

  • Standard
    14 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    8 pages
    English language
    sale 15% off
  • Standard
    9 pages
    French language
    sale 15% off
  • Draft
    8 pages
    English language
    sale 15% off
  • Draft
    8 pages
    French language
    sale 15% off

This document describes the serialization of the lexical markup framework (LMF) model defined as an XML model compliant with the Text Encoding Initiative (TEI) Guidelines. This serialization covers the classes of ISO 24613-1 (the LMF core model) as well as classes provided by ISO 24613-2 (the machine readable dictionary, MRD, model) and ISO 24613-3 (the etymological extension).

  • Standard
    25 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    20 pages
    English language
    sale 15% off
  • Standard
    20 pages
    French language
    sale 15% off
  • Draft
    20 pages
    English language
    sale 15% off
  • Draft
    20 pages
    French language
    sale 15% off

This document provides a set of empirically and theoretically well-motivated concepts for dialogue annotation, a formal language for expressing dialogue annotations (the Dialogue Act Markup Language, DiAML), and a method for segmenting a dialogue into semantic units. This allows the manual or automatic annotation of dialogue segments with information about the communicative actions which the participants perform by their contributions to the dialogue. The annotation scheme specified in this document supports multidimensional annotation of spoken, written, and multimodal dialogues involving two or more participants. Dialogue units are viewed as having multiple communicative functions in different dimensions. The markup language DiAML has an XML-based representation format and a formal semantics which makes it possible to perform inferences with DiAML representations. This document also specifies data categories for dimensions of dialogue analysis, for communicative functions, for dialogue act qualifiers, and for relations between dialogue acts. Additionally, it provides mechanisms for customizing these sets of concepts, extending them with application-specific or domain-specific concepts and descriptions of semantic content, or selecting relevant coherent subsets of them. These mechanisms make the dialogue act concepts specified in this document useful not only for annotation but also for the recognition and generation of dialogue acts in interactive systems.

  • Standard
    101 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    95 pages
    English language
    sale 15% off
  • Draft
    95 pages
    English language
    sale 15% off

This document describes the machine-readable dictionary (MRD) model, a metamodel for representing data stored in a variety of electronic dictionary subtypes, ranging from direct support for human translators to support for machine processing.

  • Standard
    26 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    26 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    21 pages
    English language
    sale 15% off

This document provides a framework for encoding a broad range of spatial information and spatiotemporal information relating to motion as expressed in natural language texts. This document includes references to locations, general spatial entities, spatial relations (involving topological, orientational, and metric values), dimensional information, motion events, paths, and event-paths triggered by motions.

  • Standard
    38 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    38 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    32 pages
    English language
    sale 15% off

This document provides a comprehensive model for the annotation and representation of referential phenomena in natural language texts and multimodal interactions. Such phenomena can cover simple anaphoric or coreferential mechanisms as well as more complex bridging or multimodal mechanisms. It provides a reference serialisation in XML defined as a customisation of the TEI P5 guidelines. In addition, the document describes the core data categories related to referential entities and link structures, and also needed for the description of annotation schemes and serialisation mechanisms for implementing conformant models as concrete data formats.

  • Standard
    32 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    32 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    27 pages
    English language
    sale 15% off

The component metadata lifecycle needs a comprehensive infrastructure with systems that cooperate well together. To enable this level of cooperation this document provides in depth descriptions and definitions of what CMDI records, components and their representations in XML look like. This document describes these XML representations, which enable the flexible construction of interoperable metadata schemas suitable for, but not limited to, describing language resources. The metadata schemas based on these representations can be used to describe resources at different levels of granularity (e.g. descriptions on the collection level or on the level of individual resources).

  • Standard
    38 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    38 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    32 pages
    English language
    sale 15% off

This document describes the core model of the lexical markup framework (LMF)l, a metamodel for representing data in monolingual and multilingual lexical databases used with computer applications. LMF provides mechanisms that allow the development and integration of a variety of electronic lexical resource types.

  • Standard
    18 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    13 pages
    English language
    sale 15% off
  • Standard
    18 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24623-1:2018 describes the abstract metamodel designed to accommodate any corpus query language (QL) and providing a basis for coarse-grained classification. The metamodel consists of several components referred to as CQLF classes, levels, and modules, and is illustrated with examples from the Single-stream class (where a single data stream is used to organize the relevant data structures). Within this class, this document discusses three CQLF levels (Linear, Complex and Concurrent), as well as their subdivisions into modules, dictated by functional and modelling criteria. ISO 24623-1:2018 does not provide a way to specify further details beyond the above-mentioned divisions, and neither does it contain within its scope QLs designed to query more than one concurrent data stream, as in multimodal corpora or in parallel corpora (such QLs can still be classified according to the criteria suggested here for less expressive QLs).

  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    12 pages
    English language
    sale 15% off
  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24615-2:2018 describes an XML-conformant serialization of the ISO 24615‑1 meta-model, with the objective of supporting interoperability across language resources or language processing components in the domain of syntactic annotations. As an extension of ISO 24615‑1, this document is also coordinated with ISO 24612.

  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    12 pages
    English language
    sale 15% off
  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24617-8:2016 establishes the representation and annotation of local, "low-level" discourse relations between situations mentioned in discourse, where each relation is annotated independently of other relations in the same discourse. ISO 24617-8:2016 provides a basis for annotating discourse relations by specifying a set of core discourse relations, many of which have similar definitions in different frameworks. To the extent possible, this document provides mappings of the semantics across the different frameworks. ISO 24617-8:2016 is applicable to two different situations: - for annotating discourse relations in natural language corpora; - as a target representation of automatic methods for shallow discourse parsing, for summarization, and for other applications. The objectives of this specification are to provide: - a reference set of data categories that define a collection of discourse relation types with an explicit semantics; - a pivot representation based on a framework for defining discourse relations that can facilitate mapping between different frameworks; - a basis for developing guidelines for creating new resources that will be immediately interoperable with pre-existing resources. With respect to discourse structure, the limitation of this document to specifications for annotating local, "low-level" discourse relations is based on the view that (a) the analysis at this level is what is well understood and can be clearly defined; (b) further extensions to represent higher-level, global discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be compatible across frameworks, even when they are based on different theories of discourse structure. As a part of the ISO 24617 semantic annotation framework ("SemAF"), the present DR-core standard aims to be transparent in its relation to existing frameworks for discourse relation annotation, but also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617‑1 (time and events); still other discourse relations are very similar to certain predicate-argument relations ("semantic roles"), whose annotation is the subject matter of ISO 24617‑4. Since the various parts are required to form a consistent whole, this document pays special attention to the interactions of discourse relation annotation and other semantic annotation schemes (see Clause 8). ISO 24617-8:2016 does not consider global, higher-level discourse structure representation which involves linking local discourse relations to form one or more composite global structures. ISO 24617-8:2016 is, moreover, restricted to strictly semantic relations, to the exclusion of, for example, presentational relations, which concern the way in which a text is presented to its readers or the way in which speakers structure their contributions in a spoken dialogue.

  • Standard
    48 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    43 pages
    English language
    sale 15% off
  • Standard
    48 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    48 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    46 pages
    French language
    sale 15% off

ISO 24624:2016 specifies rules for representing transcriptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts. Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

  • Standard
    39 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    32 pages
    English language
    sale 15% off
  • Standard
    39 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    34 pages
    French language
    sale 15% off

ISO 24617-6:2016 specifies the approach to semantic annotation characterizing the ISO Semantic annotation framework (SemAF). It outlines the SemAF strategy for developing separate annotation schemes for certain classes of semantic phenomena, aiming in the long term to combine these into a single, coherent scheme for semantic annotation with wide coverage. In particular, it sets out the notions of both an abstract and a concrete syntax for semantic annotations, mirroring the distinction between annotations and representations that is made in the ISO Linguistic Annotation Framework. It describes the role of these notions in relation to the specification of a metamodel and a semantic interpretation of annotations, with a view to defining a well-founded annotation scheme. ISO 24617-6:2016 also provides guidelines for dealing with two issues regarding the annotation schemes defined in SemAF-parts: a) conceptual and terminological inconsistencies that may arise due to overlaps between annotation schemes and b) the treatment of semantic phenomena that cut across SemAF-parts, such as negation, modality and quantification. Instances of both issues are identified, and in some cases, direction is given as to how they may be tackled.

  • Standard
    34 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    30 pages
    English language
    sale 15% off
  • Standard
    34 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    34 pages
    English language
    sale 10% off
    e-Library read for
    1 day

As part of a drive to provide international standards for language resource management, ISO/TS 24620-1:2015 on controlled natural language (CNL) sets out the principles of CNL and its utilization together with the relevant supporting technology. However, ISO/TS 24620-1:2015 also aims to introduce a general view of CNL with its objectives and characteristics and provide a scheme for classifying a range of CNLs. ISO/TS 24620-1:2015 additionally specifies certain normalizing principles of CNLs that control the use of natural languages in particular domains and are also oriented towards areas of practical application. These areas include public administrative communications, search optimization, and the management of automatic question-answering systems, but the current version of ISO/TStract 24620-1:2015 does not address any issue involving these applications directly.

  • Technical specification
    14 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Technical specification
    9 pages
    English language
    sale 15% off
  • Technical specification
    14 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24622:2015 describes a model that enables the flexible construction of interoperable metadata schemas for Language Resources (LRs). The metadata schemas based on this model can be used to describe resources at different levels of granularity (e.g. descriptions both on the collection level and on the level of individual resources).

  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    11 pages
    English language
    sale 15% off
  • Standard
    17 pages
    English language
    sale 10% off
    e-Library read for
    1 day

The aim of ISO 24617-4:2014 is to propose a consensual annotation scheme for semantic roles; that is to say, a scheme that indicates the role that a participant plays in an event or state, as described mostly by a verb, and typically providing answers to questions such as "?who' did ?what' to ?whom'", and ?when', ?where', ?why', and ?how'. This includes not only the semantic relations between a verb and its arguments but also those relations that are relevant for other predicative elements such as nominalizations, nouns, adjectives, and predicate modifiers; the predicating role of adverbs and the use of coercion fall outside the scope of ISO 24617-4:2014.

  • Standard
    50 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    45 pages
    English language
    sale 15% off
  • Standard
    50 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    50 pages
    English language
    sale 10% off
    e-Library read for
    1 day

A discourse is a process of communication. ISO/TS 24617-5:2014 addresses how a discourse is structured in terms of its realization/presentation and content, and shows how its dual structure can be represented in a graph. The current specification focuses on the annotation of discourse structures in text only, but it can be extended to discourses in other modalities.

  • Technical specification
    22 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Technical specification
    17 pages
    English language
    sale 15% off
  • Technical specification
    22 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Technical specification
    22 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24615-1:2014 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615-1:2014 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.

  • Standard
    25 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    20 pages
    English language
    sale 15% off
  • Standard
    25 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    20 pages
    French language
    sale 15% off

ISO 24611:2012 provides a framework for the representation of annotations of word-forms in texts; such annotations concern tokens, their relationship with lexical units, and their morpho-syntactic properties.It describes a metamodel for morpho-syntactic annotation that relates to a reference to the data categories contained in the ISOCat data category registry (DCR, as defined in ISO 12620). It also describes an XML serialization for morpho-syntactic annotations, with equivalences to the guidelines of the TEI (text encoding initiative).

  • Standard
    65 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    58 pages
    English language
    sale 15% off
  • Standard
    65 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    63 pages
    French language
    sale 15% off

ISO 24616:2012 provides a generic platform for modeling and managing multilingual information in various domains: localization, translation, multimedia annotation, document management, digital library support, and information or business modeling applications. MLIF (multilingual information framework) provides a metamodel and a set of generic data categories [ISO 12620:2009] for various application domains. MLIF also provides strategies for the interoperability and/or linking of models including, but not limited to, XLIFF (Localization Interchange File Format), TMX (Transition Memory eXchange), smilText (Synchronized Multimedia Integration Language) and ITS (Internationalization Tag Set).

  • Standard
    46 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    42 pages
    English language
    sale 15% off
  • Standard
    46 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24612:2012 specifies a linguistic annotation framework (LAF) for representing linguistic annotations of language data such as corpora, speech signal and video. The framework includes an abstract data model and an XML serialization of that model for representing annotations of primary data. The serialization serves as a pivot format to allow annotations expressed in one representation format to be mapped onto another.

  • Standard
    24 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    19 pages
    English language
    sale 15% off
  • Standard
    24 pages
    English language
    sale 10% off
    e-Library read for
    1 day

Temporal information in natural language texts is an increasingly important component to the understanding of those texts. ISO 24617-1:2012, SemAF-Time, specifies a formalized XML-based markup language called ISO-TimeML, with a systematic way to extract and represent temporal information, as well as to facilitate the exchange of temporal information, both between operational language processing systems and between different temporal representation schemes. The use of guidelines for temporal annotation has been fully attested with examples from the TimeBank corpus, a collection of 183 documents that have been annotated by TimeML before the current version of ISO-TimeML was formulated.

  • Standard
    163 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    156 pages
    English language
    sale 15% off
  • Standard
    163 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24610-2:2011 provides a format to represent, store or exchange feature structures in natural language applications, for both annotation and production of linguistic data. It is ultimately designed to provide a computer format to define a type hierarchy and to declare the constraints that bear on a set of feature specifications and operations on feature structures, thus offering means to check the conformance of each feature structure with regards to a reference specification. Feature structures are an essential part of many linguistic formalisms as well as an underlying mechanism for representing the information consumed or produced by and for language engineering applications. A feature system declaration (FSD) is an auxiliary file used in conjunction with a certain type of text that makes use of fs (that is, feature structure) elements. The FSD serves four purposes. 1) It provides an encoding by which types and their subtyping and inheritance relationships can be introduced and defined, thus laying the basis for constructing a feature system. 2) It provides a mechanism by which the encoder can list all of the feature names and feature values and give a prose description as to what each represents. 3) It provides a mechanism by which type constraints can be declared, against which typed feature structures are validated relative to a given theory stated in typed feature logic. These constraints may involve constraints on the range of a feature's value, constraints on which features are permitted within certain types of feature structures, or constraints that prevent the co-occurrence of certain feature-value pairs. The source of these constraints is normally the empirical domain being modelled. 4) It provides a mechanism by which the encoder can define the intended interpretation of underspecified feature structures. This involves defining default values (whether literal or computed) for missing features. The scheme described in ISO 24610-2:2011 may be used to document any feature system, but is primarily intended for use with the typed feature structure representation defined in ISO 24610-1. The feature structure representations of ISO 24610-1 specify data structures that are subject to the typing conventions and constraints specified using ISO 24610-2:2011. The feature structure representations of ISO 24610-1 are also used within some of the elements defined in ISO 24610-2:2011.

  • Standard
    55 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    50 pages
    English language
    sale 15% off
  • Standard
    55 pages
    English language
    sale 10% off
    e-Library read for
    1 day

The basic concepts and general principles of word segmentation as defined in ISO 24614-1 apply to Chinese, Japanese and Korean. Text needs to be segmented into tokens, words, phrases or some other types of smaller textual units in order to perform certain computational applications on language resources, such as natural language processing, information retrieval and machine translation. ISO 24614-2:2011 is restricted to the segmentation of a text into words or other word segmentation units (WSUs). This task is distinct from morphological or syntactic analysis per se, although it greatly depends on morphosyntactic analysis. It is also different from the task of laying out a framework for constructing a lexicon and identifying its lexical entries, namely lemmas and lexemes. The frameworks for the latter tasks are provided by ISO 24611, ISO 24613 and ISO 24615. ISO 24614-2:2011 specifies rules for delineating WSUs for Chinese, Japanese and Korean. Some rules are common to all three languages, though each language also has its own distinct rules for identifying WSUs. The common features are discussed, then the distinct rules are laid out for Chinese, for Japanese and for Korean.

  • Standard
    49 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    43 pages
    English language
    sale 15% off
  • Standard
    49 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24619:2011 specifies requirements for the persistent identifier (PID) framework and for using PIDs as references and citations of language resources in documents as well as in language resources themselves. In this context, examples of language resources include such works as digital dictionaries, language-purposed terminological resources, machine-translation lexica, annotated multimedia/multimodal corpora, text corpora that have been annotated with, for example, morpho-syntactic information, and the like. Computational and applied linguists and information specialists create such resources. ISO 24619:2011 also addresses issues of persistence and granularity of references to resources, first by requiring that persistent references be implemented by using a PID framework and further by imposing requirements on any PID frameworks used for this purpose. PID frameworks also allow the association of general metadata with the identifier, which can also contain citation information. ISO 24619:2011 specifies minimum requirements for effective use of PIDs in language resources and cites the use of several possible existing standards and de-facto standards.

  • Standard
    34 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    29 pages
    English language
    sale 15% off
  • Standard
    34 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24614-1:2010 presents the basic concepts and general principles of word segmentation, and provides language-independent guidelines to enable written texts to be segmented, in a reliable and reproducible manner, into word segmentation units (WSU). The many applications and fields that need to segment texts into words — and thus to which ISO 24614-1:2010 can be applied — include translation, content management, speech technologies, computational linguistics and lexicography.

  • Standard
    20 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    15 pages
    English language
    sale 15% off
  • Standard
    20 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24610-1:2006 provides a format for the representation, storage and exchange of feature structures in natural language applications concerned with the annotation, production or analysis of linguistic data. It also defines a computer format for the description of constraints that bear on a set of features, feature values, feature specifications and operations on feature structures, thus offering a means of checking the conformance of each feature structure with regards to a reference specification.

  • Standard
    84 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    78 pages
    English language
    sale 15% off
  • Standard
    84 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24617-7:2014 provides a framework for encoding a broad range not only of spatial information, but also of spatiotemporal information relating to motion as expressed in natural language texts. It includes references to locations, general spatial entities, spatial relations (involving topological, orientational, and metric values), dimensional information, motion events, and paths.

  • Standard
    60 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    60 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    55 pages
    English language
    sale 15% off
  • Standard
    60 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24617-2:2012 provides a set of empirically and theoretically well-motivated concepts for dialogue annotation, a formal language for expressing dialogue annotations -- the dialogue act markup language (DiAML) -- and a method for segmenting a dialogue into semantic units. This allows the manual or automatic annotation of dialogue segments with information about the communicative actions which the participants perform by their contributions to the dialogue. It supports multidimensional annotation, in which units in dialogue are viewed as having multiple communicative functions. The DiAML language has an XML-based representation format and a formal semantics which makes it possible to apply inference to DiAML representations. ISO 24617-2:2012 specifies data categories for reference sets of communicative functions and dimensions of dialogue analysis and provides principles and guidelines for extending these sets or selecting coherent subsets of them. Additionally, it provides guidelines for annotators and annotated examples. It is applicable to spoken, written and multimodal dialogues involving two or more participants.

  • Standard
    108 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    104 pages
    English language
    sale 15% off
  • Standard
    108 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    108 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.

  • Standard
    18 pages
    English language
    sale 15% off
  • Standard
    23 pages
    English language
    sale 10% off
    e-Library read for
    1 day

ISO 24613:2008 describes the Lexical Markup Framework (LMF), a metamodel for representing data in lexical databases used with monolingual and multilingual computer applications. LMF provides mechanisms that allow the development and integration of a variety of electronic lexical resource types. These mechanisms will present existing lexicons as far as possible. If this is impossible, problematic information will be identified and isolated.

  • Standard
    82 pages
    English language
    sale 10% off
    e-Library read for
    1 day
  • Standard
    77 pages
    English language
    sale 15% off