BagIt File Packaging Format
Definition
A set of hierarchical file layout conventions for storage and transfer of arbitrary digital content. A "bag" has just enough structure to enclose descriptive metadata "tags" and a file "payload" but does not require knowledge of the payload's internal semantics. This BagIt format is suitable for reliable storage and transfer.
Source: Kunze, J., Littman, J., Madden, E., Scancella, J., and C. Adams, "The BagIt File Packaging Format (V1.0)", RFC 8493, DOI 10.17487/RFC8493, October 2018, <https://www.rfc-editor.org/info/rfc8493.>
Introduction
BagIt is a widely adopted set of conventions for storage and transfer of digital content. Packages are known as Bags. According to the BagIt format, Bags must have a “payload manifest” file that provides a complete listing of each file in the Bag along with a corresponding checksum to support transfer validation and fixity checks.
The introduction to the BagIt File Packaging Format:
BagIt is a set of hierarchical file layout conventions designed to support storage and transfer of arbitrary digital content. A "bag" consists of a directory containing the payload files and other accompanying metadata files known as "tag" files. The "tags" are metadata files intended to facilitate and document the storage and transfer of the bag. Processing a bag does not require any understanding of the payload file contents, and the payload files can be accessed without processing the BagIt metadata.
The name, BagIt, is inspired by the "enclose and deposit" method [ENCDEP], sometimes referred to as "bag it and tag it". BagIt differs from serialized archival formats such as MIME, TAR, or ZIP in two general areas:
1. Strong integrity assurances. The format supports cryptographic-quality hash algorithms (see Section 2.4) and allows for in-place upgrades to add additional manifests using stronger algorithms without breaking backwards compatibility. This provides high levels of confidence against data corruption, but it is not designed to be secure against active attacks.
2. Direct file access. Because BagIt specifies an actual filesystem hierarchy rather than a serialized representation of one, files can be accessed using standard operating system utilities, implementations do not need to process a potentially large archival file to extract a subset of data, and the format imposes no size limits for either individual files or a bag.
Implementing BagIt
BagIt can be incorporated into digital archives workflows through the BagIt-Python utility or by adopting / adapting tools that have been developed to create, transfer, and validate Bags.
Tool | Description |
---|---|
Python library and command-line tool for working with Bags. | |
Java-based desktop utility developed by Library of Congress. No longer maintained or supported. | |
Python-based desktop utility built by SFU Archives and SFU Library to prepare digital objects for transfer to the Archives. | |
Desktop utility and command-line tool for packaging files and uploading them to remote repositories. | |
Desktop utility to create and transfer Bags. |