Embedding Structure into HTML for More Precise Retrieval of Information, A Novel XML Schema
محتوى المقالة الرئيسي
الملخص
This paper presents the core of a universal schema to transform each HTML document into XML format. The objective is to embed a sense of structure into textual documents prior to retrieving information. The structure is obtained from the HTML document based on the schema and applied in the form of an XML document. The resulting structure helps with identifying levels of significance in the HTML page. More relevant results can be obtained by including the hidden structure of the text document in the computation of relevancy during retrieval. The preliminary study indicates potential success with larger studies.
تفاصيل المقالة
هذا العمل مرخص بموجب Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.