Sibling-first data organization for efficient XML data processing

dc.contributor.advisorWang, Fang Ju
dc.contributor.authorHomayounfar, Hooman
dc.date.accessioned2020-12-03T18:09:23Z
dc.date.available2020-12-03T18:09:23Z
dc.date.copyright2007
dc.degree.departmentDepartment of Computing and Information Scienceen_US
dc.degree.grantorUniversity of Guelphen_US
dc.degree.nameDoctor of Philosophyen_US
dc.description.abstractXML is becoming one of the most important structures for data exchange. Despite having many advantages, XML structure imposes several major obstacles to large document processing. Incompatibility between the linear nature of the current algorithms such as caching and prefetch used in operating systems and databases, and the non-linear structure of XML data makes XML processing more costly. In addition to verbosity, parsing depth-first (DF) structure of XML documents is a significant overhead to processing applications, including search engines. Recent research on XML query processing has learned that sibling clustering can improve performance significantly. However, the existing methods are limited in several aspects including in processing very large documents. In this research, a better data organization has been developed for native XML databases, named sibling-first (SF), that significantly improves the performance in large data processing. SF uses an embedded index for fast access to child nodes. It also compresses documents by eliminating extra data from the original DF format. The converted SF documents can be processed for XPath query purposes without being parsed. The SF storage has been implemented in virtual memory as well as a format on disk. Experimental results with real data have shown that significantly higher performance can be achieved when XPath queries are conducted on very large SF documents.en_US
dc.identifier.urihttps://hdl.handle.net/10214/21993
dc.language.isoen
dc.publisherUniversity of Guelphen_US
dc.rights.licenseAll items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectXMLen_US
dc.subjectXML dataen_US
dc.subjectXML structureen_US
dc.subjectdata organizationen_US
dc.subjectXML databasesen_US
dc.subjectsibling-firsten_US
dc.subjectdata processingen_US
dc.titleSibling-first data organization for efficient XML data processingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Homayounfar_Hooman_PhD.pdf
Size:
8.16 MB
Format:
Adobe Portable Document Format