Main Menu

Tag Archives | Archives

Archives

You use that word “archive,” but I don’t think it means what you think it does.

A couple of things have led me to think about archiving this week. The first is the brouhaha about a crashed hard drive at the IRS, and the IRS claim that all the backups are overwritten after 6 months. When is an email a record? What are the retention policies at the IRS? I lack the motivation and the stamina to examine this question in detail, but for the brave souls that might want to dig deeper, I offer these documents:

There is, of course, the occasional variance between policy and practice in the real world.

The other thing that made me think about archiving is the recent going offline of the Orly Taitz ESQ web site. Naturally, when such a site goes offline for a while, there is something on it needed. The Wayback Machine is a great source to find old web pages, but a fair portion of Orly’s site is not in the Wayback Machine for some reason. The Google Cache captures some things, and some sites, including Orly’s site and this one, get republished by Before It’s News. BIN provides us a copy of Orly’s article that some suggest is the reason her web site has been taken down. Here is the advice of “Attorney Orly Taitz” from the article:

Now there are a lot of lost Mexican children, who wandered into the US territory. Well, it is time for every American to become a good Samaritan and help the lost Mexican children by driving them to the border and taking them to thecustody of Mexican border patrol, so that Mexican border patrol that speaks Spanish, can reunite them with their families in Mexico.

Bizarre! Attorney Taitz is giving advice that could result in anyone who follows it ending up in federal prison.

What hit home for me is that over 700 articles on my own web site are not in the Wayback Machine for some reason. Some missing articles are current, but many date back as early as 2010. Ouch! I have been busy coding software yesterday and today to deal with this problem. What I did was to use the WordPress API to download the URLs of all the posts on this blog into a database. I then used the Wayback Machine API to determine which of them were not in the Wayback Machine archive. I then developed a system for adding them. Right now, I have to push a couple of buttons to scroll to the next missing page and add it. I’ll automate that shortly. I just have to be careful not to add to fast, or I’ll get kicked off as an attacker. A 5-second delay seems to work pretty well.