An efficient algorithm for joining large XML documents
XML is becoming the major markup language in developing heterogeneous distributed databases. Data from different sources can be encoded as XML documents and processed together. Join is one of the most important database operations for processing data together. XML documents have special features that make them different from relational data. Most join techniques developed for relational databases cannot be directly adopted for processing XML data. Efficient join algorithms are needed for building high performance XML databases. This thesis describes an efficient algorithm for joining large XML documents. This algorithm scans the data only one or two times. It creates a set of supporting structures then performs join in main memory or by direct disk access. It does not require any existing index structures, and is not dependent on the support from database software (e.g. an RDBMS).