GSQ Report OCR Index File

The Geological Survey of Queensland (GSQ) is the custodian of over 100,000 reports and submissions from the Queensland resources industry, dating back more than 100 years. These legacy reports have been digitised using Optical Character Recognition (OCR) software to make them machine-readable.
The GSQ Report Index contains a list of single words and phrases of up to 3 words that you would expect to occur together in a sentence, and the report PIDs of the reports these words and phrases occur in. The purpose of this search capability is to find reports that contain terms of interest based on text content, across commodities and report types.
Please note, the GSQ Report Index contains only words and letters, no numbers. If you are looking for reports on a particular permit or borehole number, the broader GSQ Open Data Portal is a more suitable place for your search.
The GSQ Report Index is in JSON format, and can be read into a python script as a dictionary to filter. To make searching easier, GSQ has developed an app based on the Streamlit platform.

Please visit the GSQ OCR Index Search Tool to access this search functionality.

This GSQ Report Index was created by the OCR of more than 80,000 open-file reports. As more reports become open-file in the future, the GSQ Report Index will be updated.

Files
Sort resources by:

Details

Dataset persistent identifier DS000079
Dataset theme Other
Dataset contains these earth science data categories
  • Earth Sciences
  • Engineering
  • Geochemistry
  • Geology
  • Geophysics
Permit number
Dataset is of geological features
Dataset start date
Dataset end date
Dataset extents in GeoJSON
Creator Geological Survey of Queensland
Maintainer email GSQOpenData@resources.qld.gov.au

Activity