Build a web archive collection with Archive-It

Introduction


The following table outlines the basic steps to build a web archive collection in Archive-It. See the Archive-It User Guide and Archive-It Video Curriculum for more information.

Step Number

Description

Documentation

1

Log into Archive-It

https://support.archive-it.org/hc/en-us/articles/207999976-Set-up-and-administer-your-account#Howtologintothewebapplication

2

Create a collection

https://support.archive-it.org/hc/en-us/articles/207999936-How-to-create-a-collection

3

Add seeds to collection

https://support.archive-it.org/hc/en-us/articles/207999936-How-to-create-a-collection


https://support.archive-it.org/hc/en-us/articles/208331753-How-to-select-seeds-


https://support.archive-it.org/hc/en-us/articles/208332843-How-to-assign-and-edit-a-seed-type-

4

Add new seeds to an existing collection

https://support.archive-it.org/hc/en-us/articles/208331753-How-to-select-seeds-#How-to-add-new-seeds-to-an-existing-collection

5

Add collection-level metadata

https://support.archive-it.org/hc/en-us/articles/208332603-Add-edit-and-manage-your-metadata

6

Upload image to represent collection

https://support.archive-it.org/hc/en-us/articles/207999936-How-to-create-a-collection

7

Modify crawl scope at collection-level and/or seed-level (see examples at end of table)

Seed-level scoping vs collection-level scoping: https://support.archive-it.org/hc/en-us/articles/208333803-Seed-level-scoping-vs-collection-level-scoping-What-s-the-difference-

How to modify your collection's crawl scope: https://support.archive-it.org/hc/en-us/articles/208001046-How-to-modify-your-collection-s-crawl-scope

Getting started with pre-crawl scoping (video tutorial): https://support.archive-it.org/hc/en-us/articles/216489103-Archive-It-Video-Curriculum-#gettingstartedPreCrawl

PDF only crawl (video tutorial): https://support.archive-it.org/hc/en-us/articles/216489103-Archive-It-Video-Curriculum-#gettingstartedPDFonly

8

Run test crawl

https://support.archive-it.org/hc/en-us/articles/208001226-Run-monitor-and-save-a-test-crawl

https://support.archive-it.org/hc/en-us/articles/216489103-Archive-It-Video-Curriculum-#gettingstartedTest

9

Perform quality assurance (QA)

https://support.archive-it.org/hc/en-us/articles/208333833-How-to-perform-quality-assurance-QA-and-using-Wayback-QA

10

Read test crawl report

https://support.archive-it.org/hc/en-us/articles/208002126-How-to-read-your-crawl-s-report


https://support.archive-it.org/hc/en-us/articles/216489103-Archive-It-Video-Curriculum-#postcrawl

11

Modify collection crawl scope, if necessary

https://support.archive-it.org/hc/en-us/articles/208333803-Seed-level-scoping-vs-collection-level-scoping-What-s-the-difference-


https://support.archive-it.org/hc/en-us/articles/208001046-How-to-modify-your-collection-s-crawl-scope


https://support.archive-it.org/hc/en-us/articles/208333823-Modify-scope-and-run-patch-crawls-from-your-report

12

Run second test crawl

https://support.archive-it.org/hc/en-us/articles/208001226-Run-monitor-and-save-a-test-crawl

13

Perform QA, read test crawl report, modify crawl scope until satisfied with results of test crawl

Quality assurance overview: https://support.archive-it.org/hc/en-us/articles/208333833-Quality-Assurance-Overview

Post crawl analysis (video tutorials): https://support.archive-it.org/hc/en-us/articles/216489103-Archive-It-Video-Curriculum-#postcrawl

How to use the Wayback QA tool: https://support.archive-it.org/hc/en-us/articles/115004144786-How-to-use-the-Wayback-QA-Tool

Access to your archives in "proxy mode":https://support.archive-it.org/hc/en-us/articles/208002206-Access-to-your-archives-in-Proxy-Mode-

14

Save test crawl data

https://support.archive-it.org/hc/en-us/articles/208001226-Run-monitor-and-save-a-test-crawl#Howtorunmonitorandsaveatestcrawl-Howtosaveordiscardtestcrawldata

15

Add seed-level metadata

See guidelines on seed-level metadata. Also review official Archive-It Help Center pages:

https://support.archive-it.org/hc/en-us/articles/208332603-Add-edit-and-manage-your-metadata

https://support.archive-it.org/hc/en-us/articles/360014464192-Add-and-Edit-Seed-Level-Metadata-

https://support.archive-it.org/hc/en-us/articles/208012996-Upload-and-download-metadata

16

Add document-level metadata (if necessary)

https://support.archive-it.org/hc/en-us/articles/208012676-How-to-add-and-edit-metadata-at-the-document-level

17

Enable OAI-PMH

https://support.archive-it.org/hc/en-us/articles/210510506-Access-web-archives-with-the-OAI-PMH-metadata-feed

18

Make collection “public”

https://support.archive-it.org/hc/en-us/articles/208334003-Controlling-access-to-your-web-archives-#Howtorestrictaccesstoyourwebarchives-Howtomakeanentirecollectionprivateorpublic

19

Confirm crawl frequency settings

https://support.archive-it.org/hc/en-us/articles/208333013-Schedule-crawls

20

Log out of Archive-It