Dalhousie Libraries - Documentation Wiki

Introduction to web archiving

Owned by Creighton Barrett

Mar 04, 2021

1 min read

1 Introduction
2 Learning outcomes
3 Introduction to web archiving
4 Questions
5 Further reading

Introduction

This page provides selected resources that introduce basic concepts in web archiving.
Many resources are derived from the Archive-It User Guide.

Learning outcomes

After reviewing this material, learners will be able to:

Describe basic terms used in web archiving
Locate additional resources to support further study and training

Introduction to web archiving

These resources will introduce basic concepts in web archiving:

Read the “What is web archiving?” page in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/360041674111-What-is-web-archiving-
Watch the “What is a web archive” video produced by the UK Web Archive:

Review the “Glossary of Archive-It and Web Archiving Terms” in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/208111686-Glossary-of-Archive-It-and-Web-Archiving-Terms
Read the “Archive-It Crawling Technology” page in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/115001081186-Archive-It-Crawling-Technology
Read the “Known Web Archiving Challenges” page in the Archive-It User Guide: https://support.archive-it.org/hc/en-us/articles/209637043-Known-Web-Archiving-Challenges
Read the “Archivability” page in the Stanford University Libraries Web Archiving Guide: https://library.stanford.edu/projects/web-archiving/archivability

Questions

How do libraries and archives create web archives?
What is a robots.txt file? How can it affect web archiving technology?
What is a crawler trap?
What is the difference between the Archive-It standard crawler (Heritrix) and Brozzler?

Further reading

See the web archiving reading list for additional resources.

Dalhousie Libraries - Documentation Wiki