The Libraries engage in various web archiving activities to target, capture, and preserve important web-based content.
About web archiving
Websites are informational resources documenting various events, subjects, and changes in society. These resources are often updated over time and important information can be lost. Consequently, in many cases it is imperative that this information is preserved beyond its fleeting online lifespan.
The Libraries use Archive-It to capture and crawl web content, including the University of Manitoba website and other online documentary heritage.
The material captured through Archive-It is collected for research and private study. All requests to reproduce and use the archived content must be sent to the website owner directly.
Accessing web archive collections
Common questions about web archiving
Is personal information captured in web archiving?
The Libraries only capture and preserve copies of websites that are already publicly accessible.
Do you archive websites that are password-protected?
No, we do not capture password-protected content, unless it is under the ownership of the University of Manitoba and needs to be captured for records management purposes. The latter type of content remains private and is not made publicly accessible.
What about sites that block web crawlers?
The Libraries respect robot exclusion protocols which restrict the crawling of certain online content. Content that includes the exclusion will only be crawled with permission from the content creator.
What about copyright and ownership?
Copyright and ownership of the archived web content remain with the owner(s) identified on a website and are governed by local, national, and/or international laws and regulations. The Libraries respect the intellectual property rights and the proprietary rights of others. The Libraries assume no liability for the accuracy or lawfulness of the archived websites or the contents within them.
Can I use this content in my research?
All requests to reproduce and use the archived content must be directed to the website owner directly. The Libraries cannot authorize use of the material, nor will they act as an intermediary to the transaction. It is the responsibility of individual users of web archive collections to abide by all relevant copyright legislation and restrictions when accessing archived webpages, and to identify and contact the appropriate authority for permission.
What type of information is captured?
In most cases, a web crawler will capture everything from HTML code, CSS files, PDFs, images, and video files. More dynamic content, such as web pages that adjust dynamically based on the browser window size, interactive maps, flash, password protected content, or functions requiring a user’s input (for example, fillable forms, “play” buttons, search boxes, login/password fields, etc.) may not be captured, or captured to a more limited extent.
Can I opt-out, or request specific content be taken down?
When a website owner authorizes communication of their work to the public without technological restrictions (such as a robot.txt exclusion), the Libraries view this as the website owner’s implicit consent to the indexing and caching of their website content. Where a site uses technological protection measures to restrict crawling technology, the UML will not harvest the content without providing notification and/or securing permission.
If someone has a legitimate complaint about the content in the University’s Archive-It account, consult the Libraries’ take-down submission process for next steps.