Introduction to web archiving
Introduction
This page provides selected resources that introduce basic concepts in web archiving.
Many resources are derived from the Archive-It User Guide.
Learning outcomes
After reviewing this material, learners will be able to:
Describe basic terms used in web archiving
Locate additional resources to support further study and training
Introduction to web archiving
These resources will introduce basic concepts in web archiving:
Read the “What is web archiving?” page in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/360041674111-What-is-web-archiving-
Watch the “What is a web archive” video produced by the UK Web Archive:
Review the “Glossary of Archive-It and Web Archiving Terms” in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms
Read the “Archive-It Crawling Technology” page in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/115001081186-Archive-It-Crawling-Technology
Read the “Known Web Archiving Challenges” page in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/209637043-Known-Web-Archiving-Challenges
Read the “Archivability” page in the Stanford University Libraries Web Archiving Guide: https://library.stanford.edu/projects/web-archiving/archivability
Questions
How do libraries and archives create web archives?
What is a robots.txt file? How can it affect web archiving technology?
What is a crawler trap?
What is the difference between the Archive-It standard crawler (Heritrix) and Brozzler?
Further reading
See the web archiving reading list for additional resources.