FileMeta

File Identity and Metadata Project

A Project by Brandt Redd

Tuesday, November 27, 2018

Best Practice for Date and Time Metadata

Calendar and Clock

The story is told about a museum guest looking at a dinosaur skeleton. She asks the guide, "When did this dinosaur live?"

The guide answers, "One hundred twenty million and eleven years ago."

"Wow!" says the guest, "How do you know so precisely?"

"Well, when I started working here, they told me that the skeleton was 120 million years old and that was eleven years ago."

Knowing the precision of a date is important. The ISO 8601 standard date format is commonly used in metadata. When used properly, it conveys timezone and precision information. But, too often the latter parts are lost. In this post I'll explain how that information is transmitted, why it's important, how it gets lost, and some strategies I use for detecting and preserving these features.

Relevant Standards

The Dublin Core listing for date defines the property as "A point or period of time associated with an event in the lifecycle of the resource." By this definition, the property includes time information, not only date. This applies to derivative DMCI properties such as dateSubmitted and dateAccepted. So, when you encounter a "date" property in metadata, the time is usually included.

DCMI further recommends that dates are stored according to the W3CDTF profile of ISO 8601. The W3CDTF format usually conveys time zone and precision. This information is often missing when date and time are parsed into or stored in binary formats and we have to find ways to compensate.

Recommendations and Samples

The following practices will ensure that timezone and precision are recorded and preserved.

When storing date properties in W3CDTF and ISO 8601 formats, use the local time relevant to where the event occurred and include timezone information.

For example, an event in New York City (Eastern Time Zone) might be timestamped as "2018-11-28T13:25:04-05:00" This is parsed as 28 November 2018 at 1:25:04 PM Eastern Standard Time (UTC minus five hours).

Under most circumstances, humans care about when something happened in the timezone where the event occurred. For example, if I'm looking at a photo of New York's Central Park, I prefer to be told the photo was taken 5:00 PM in New York (so sun is getting low) even though that was 2:00 PM in California. The airline industry has proven that this works best for consumers. Takeoff and landing times are always given in the time local to the airport.

Including the timezone lets us convert from local to UTC and from UTC to another timezone when needed. For example, if you are plotting a set of events that originate from different timezones on a calendar or timeline you would want to convert all events to the same timezone. Likewise, if people from different timezones are collaborating on a project, each would want to see times rendered into their local timezone. Including timezone in the metadata ensures that the application can render the time in whatever zone makes the most sense to the context.

When the time is deliberately expressed in UTC then W3CDTF/ISO 8601 indicates that you use "Z" for the timezone designator. For example, "2018-11-28T18:25:04Z" would be the time of the event listed above. Some systems standardize on always using UTC for internal storage because its consistent worldwide. Indeed, that practice is helpful in indexes where it makes sorting and searching more convenient. However, unless you also store the timezone in a separate property, you lose information about the local time of the event. So, best practice for metadata is to use local time plus timezone. UTC and the "Z" time zone designator should principally be used for legacy data when the timezone of the event is not known.

When the time is local, but the timezone is unknown, then leave off the time zone designator like this, "2018-11-28T13:25:04".

For one more example, consider an event that happens in London. The timezone is Greenwich Mean Time (GMT) which has zero offset from UTC. You could express this event as "2018-12-25T07:05:01+00:00" or as "2018-12-25T07:05:01Z". Both express exactly the same time. However, the first instance is a "Local Time" indicating that the event occurred in the GMT time zone - somewhere in the UK or at the same longitude. The second instance is UTC meaning that the event could have happened anywhere in the world and the timezone is not expressed.

Here's a summary:

ConditionSamples
Local time with timezone
(Best Practice)
2018-11-28T13:25:04-05:00
2018-12-25T07:05:01+00:00
Local time with
unstated timezone.
2018-11-28T13:25:04
2018-12-25T07:05:01
UTC time with
unstated timezone.
2018-11-28T18:25:04Z
2018-12-25T07:05:01Z

When storing and parsing date properties in W3CDTF and ISO 8601 formats, tolerate partial values and preserve precision.

For example, "1976" is a valid date-time in W3DTF format. It means that the event occurred sometime in the year 1976. By the same token, "1976-07-04T21:05:02.319-05:00" is also a valid date-time in W3DTF format. In indicates that the event happened on July 4, 1976 at precisely 9:05 PM and 2.319 seconds.

Unfortunately, most parsers would either reject the year-only value or they would convert it to January 1 1976 at precisely midnight. The former case doesn't work and the latter case conveys more precision than was intended.

Precision can be expressed as significant digits according to the following table:

Significant
Digits
PrecisionExample
4Year1976
6Month1976-07
8Day1976-07-04
10Hour1976-07-04T21-05:00
12Minute1976-07-04T21:05-05:00
14Second1976-07-04T21:05:02-05:00
17Millisecond1976-07-04T21:05:02.319-05:00

Notice that when the precision is finer than one day, then timezone should be included. At coarser granularity (day and above) the timezone is no longer relevant. Again, this is according to the W3CDTF standard.

Parsers for W3CDTF formatted dates should include a precision information in their output so that the application can preserve and make use of that detail. Formatters should accept precision information and generate the string accordingly.

When using existing metadata properties that don't include timezone or precision, add that data in custom metadata properties.

Many existing metadata formats have date properties that don't include precision or timezone information. Because existing applications recognize the existing properties, it's best not to substitute your own properties that follow the practices above. Rather, continue to store the time in existing properties and then add new custom properties to express the timezone and precision information.

Timezone information should be expressed as the difference, in hours and minutes, between UTC and local time. For example, Eastern Standard Time (EST) is "-05:00". Most, but not all, timezones are offset by whole hours. For those cases, the minutes can be left off. Leading zeros are also not required. Thus, "-5" may be an acceptable shortened value for EST in a precision property. However, W3CDTF requires the full value - leading zeros and minutes.

Precison should be expressed in terms of significant digits when the date and time are rendered in ISO 8601 format. When counting digits, do not include the hyphen, colon, or "T" characters. The table above shows typical values.

Example: Photo Management

To illustrate the problems that occur when these practices aren't followed, I'll use the photo management project I'm currently working on. I have a collection of more than 100,000 family photos and short videos. The photos are all in JPEG (.jpg) format with EXIF metadata. Videos are in a mix of Audio Video Interleave (.avi), QuickTime (.mov), and MPEG-4 (.mp4) formats.

For JPEG-EXIF images, the relevant date property is EXIF:DateTimeOriginal. According to the EXIF standard, the property is in ISO 8601 format but does not include the timezone suffix and should be rendered in local time - that is, in the time zone in which the photo was taken.

Both MP4 and Quicktime (.mov) video files use the ISOM Format. For these files, the relevant property is creation_time which is stored internally in binary form. According the ISOM specification, creation_time and other date properties are in UTC.

Neither of these formats include timezone information by default. EXIF defines an optional timezone property but I haven't found any file samples that include it. Many cameras don't have a timezone setting. For example, the Fuji camera I have only has a local time setting. That's no problem for JPEG files but for video files (in Quicktime .mov format) the Fuji camera fills in creation_time with the local time even though the property is supposed to be UTC. UTC is not possible because the camera doesn't have timezone information.

My Canon camera does have a timezone setting. For photos (in JPEG format) it fills in DateTimeOriginal with the local time, as expected. For videos (in .MP4 format) it fills in creation_time in UTC. In both cases, the Canon camera includes timezone information in a proprietary Canon property as part of the Makernote.

The Timezone Problem

Consider the slideshow program I'm working on. It should display photos and videos interleaved in the order they were taken. If I use my Canon camera, and the pictures were taken near my home, then the timestamps on the photos and videos will be consistent.

Lets say I go on vacation to Florida; I live in Mountain time. If I fail to set the timezone correctly on my Canon camera (which is often the case) then the photos and videos will display in the right order when I get home. However, the timestamps I show will be off by two hours. On the other hand, if I set the timezone correctly on my camera then, when I get home, the photos and videos will not be consistent with each other. Let's say I take a photo at 11:00 (while in Florida) and I take a video at 11:01. When I return home to Mountain time, my computer will see the photo at 11:00 because it was stored in local time. The video will show as 9:01 local time. That's because the video was timestamped at 16:01 UTC (Florida's Eastern time zone is UTC-5) and in Mountain time (UTC-7) the timestamp is interpreted as 9:01 local time. So, the video is treated as if it was two hours before the photo instead of one minute after.

Compensating for the Timezone Problem

These solutions are restricted to the photo and video case. For other media, other strategies may be required.

In most cases, the goal is to discover the timezone in effect at the time of the event and store that in a custom metadata property. Then use the existing metadata properties as they are defined - that is to use UTC or Local time according to the file format specification. When converting between UTC and local time, use the timezone from the metadata rather than the timezone setting of the computer. Doing this will result in consistent local or UTC values for both photos (which store local time in the metadata) and videos (which store UTC).

For Canon, and other cameras that store timezone in the Makernote, the timezone can be extracted using Phil Harvey's ExifTool.

Most cameras store photos on flash media formatted using the FAT file system. Conveniently, FAT uses local time. So, by comparing the metadata time against the file system time, you can detect whether whether a video creation_time is in local or UTC. If it's in UTC then the timezone can be detected from the difference.

When comparing date-times you must take into account inconsistent resolution. On the FAT file system, Creation DateTime has a 10ms resolution while Modification DateTime has a 2 second resolution. ISOM DateTime values have a one-second resolution.

As files are copied from removable media to local drives, then synced to the cloud, and so forth, the file system times can be changed. Or, if the file is copied to a file system that stores UTC, and then the timezone of the computer is changed, then the timestamp will also shift. So, it's important detect timezone early and then store in a custom metadata property before that opportunity is lost. That's among the tasks of the FMPhotoFinisher tool.

Compensating for the Precision Problem

Many of my photos come from the pre-digital era and I am slowly digitizing them. In the latter part of the 1990s, I had a camera with a data back that embossed the date on the photo. But the resolution of that information is one day. Meanwhile, the EXIF DateTimeOriginal property has second-level resolution. For earlier photos, it's even more difficult to determine when they were taken. Some of my slides are stamped with the year and month they were developed. For certain photos, all I can guess at are the year and the season.

For compatibility with existing applications that use date metadata, the date should be stored in the existing DateTimeOriginal property. Like Timezone, we add a custom "datePrecision" metadata property and use the number of significant digits from the table above. An application that recognizes the DatePrecision property ignores all detail below that level. When writing the value, use all zeros (or 1's for month and day) for sub-precision components. So, an event in February 2015 with month-level precision would be rendered as "date: 2015-02-01T00:00:00", "datePrecision: 6".

There is no reliable way to detect precision from existing file metadata. But when users enter date information the UI should enable them to leave out levels of detail and the precision would be determined from there.

Wrapup

Many metadata properties were defined with an incomplete understanding of the use cases. When defining date properties, it seems reasonable to use ISO 8601 format out to the second level rendered into UTC. However, closer examination shows that doing so loses two critical pieces of information that careful use of ISO 8601 can preserve: precision and timezone.

The same basic principles can apply to other properties. When defining data elements, always consider the full set of information being conveyed and make sure that the format doesn't lose some of that information.

Monday, September 3, 2018

Custom CSS for Blogger

Lets say you succeeded in creating a cool website and you developed your own css style that's simple, clean and functional. Besides, you even made a responsive design. Now you want to add a blog to that site and use the same style on your blog as you do on the rest of your site.

You choose Google blogger because it's simple to use, capable, and free. Blogger lets you customize your site and, if you include advertisements, they're at your choice.

You go to the settings for your blog, select the "Theme" tab, and click "Edit HTML" because that's how you load custom CSS. "Holy Caramba!" there's over 2000 lines in the blogger template! It's all been automatically generated with no comments. You might spend days or weeks figuring it all out!

Thankfully, it's much easier than that.

The Minimal Blogger Template

Rather than try and modify the thousands of lines you found in blogger, start with this minimal template.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html>
<html xmlns='http://www.w3.org/1999/xhtml'
    xmlns:b='http://www.google.com/2005/gml/b'
    xmlns:data='http://www.google.com/2005/gml/data'
    xmlns:expr='http://www.google.com/2005/gml/expr'>

  <head>

    <b:include data='blog' name='all-head-content'/>

    <title><data:blog.pageTitle/></title>

    <b:skin>
      <![CDATA[
      body {
        font: $(body.font); 
        color: $(body.text.color); 
        background: $(body.background); 
        padding: 0 $(content.shadow.spread) $(content.shadow.spread) $(content.shadow.spread); 
        $(body.background.override) margin: 0; 
        padding: 0; 
      }
      ]]>
    </b:skin>

  </head>

  <body>
    <b:section id='main' showaddelement='yes'>
      <b:widget id='Blog1' locked='true' title='Blog Posts' type='Blog'/>
    </b:section>
  </body>

</html>

We're going to assume that you're somewhat familiar with HTML and it's relative XML. If not, study a few tutorials and come back to this.

The minimal template has five important parts:

  1. Front Matter: This is all of the stuff at the beginning. It includes the xml declaration, the DOCTYPE, and the opening tag with all of the namespaces. Leave this alone. It needs to be there but you don't really need to do anything with it.
  2. <b:include... /> : This tells Blogger where to put all of its header content. Leave it alone too.
  3. <title> : This is where the page title goes. The default will fill in the title of your blog which is usually exactly what you want. So, leave this alone.
  4. <b:skin> : This is where your CSS style goes. The version in this basic template is the absolute minimum. It uses the $ tags to fill in values from the built in blogger theme editor. You're going to replace the style with your own. Doing so, without Blogger's special $ tags will disable the blogger theme editor, but you'll get exactly the look you want and have many more customization options.
  5. <body> : This is the template for the body of your blog. The basic template has a single-column layout and it only has one widget in place which is the blog body.

To this basic template we will make two customizations. We will update the CSS styles to improve the look of the blog and we'll change the layout by modifying the body template.

Adding Your CSS Styles

The first customization is to add a full set of CSS styles. These go within the <b:skin> CDATA section in the minimal template. Delete the existing body style and add your full set of styles. The set from my previous blog post is a good starting point. Paste the full text of the styles directly in.

To that basic style set, we will add a few more styles to support the layout of the blogger site. In this case, I'm using a two-column layout. You can modify this for a different number of columns.

/* ========= Column Layout ========= */

#columns {
    display: block;
    border: none;
    margin: auto;
}

#column1 {
    display: table-cell;
}

#column2 {
    display: table-cell;
    width: 15em;
}

@media only screen and (max-width: 30em) {
    #column1 {
        display: block;
    }

    #column2 {
        display: block;
    }
}

The first three styles here set up the wrapper for the set of columns and then the styles for the two columns themselves. Setting display to "table-cell" is a handy way to position the columns side-by-side without using an actual table.

The "@media only..." clause indicates that the next section only takes effect if the width of the screen is less than 30em (em is relative to the font size). In that case we change display to "block" (instead of "table-cell") which will cause the so-called columns to be laid out top to bottom. The net result of this is a responsive design. On narrow displays (e.g. phones) the blog will be laid out vertically with the first section above the second.

Now we'll add a few styles to customize the Blogger look. The styles below are tied to the class names of specific blogger elements. I've inserted blank styles for others that you might want to customize. If there's some other element you want to customize that I didn't include here, you can discover the right reference by bringing up your blog in the Chrome or Edge browser, using programmer tools to examine the element you want to change and then look to see what class is applied to the element or a near parent of the element.

The comments help explain what each style is doing.

/* ========= Blogger-Specific Styles ========= */

/* The header before a set of posts on a particular date.
On the main page this appears before each new date.
On a post page this appears right before the post header.
*/
.date-header {
    font-size: 1.5em;
    font-weight: bold;
    color: var(--clr_accent2);
    margin-top: 0.5em;
    margin-bottom: 0;
}

.post-outer {
}
.post-outer::after {
    /* Draw a gradient line between blog posts */
    content: "";
    clear: both;
    display: table;
    background-image: var(--gradient_hr);
    height: 0.2em;
    width: 100%;
    margin-top: 1em;
}

/* The title of a particular post. Blogger uses h3 but we make the style match h1. */
.post-title {
    font-size: 2.0em;
    font-weight: normal;
    color: var(--clr_accent1);
    margin-top: 0.2em;
    margin-bottom: 0.2em;
}

.post-header {   
}

.post-body {
}

.post-footer {
}

/* Widget titles are <h2>. The second column where my controls are located is class='sidebar' */
.sidebar h2 {
    padding: 0 0.2em;
    background-image: linear-gradient(to right, #D0D0D0, #E8E8E8, #E8E8E8, #D0D0D0);
}

/* Customize the previous, next, and home links to look like buttons. */
.blog-pager-older-link, .blog-pager-newer-link, .home-link {
    display: inline-block;
    padding: 5px;
    color: white;
    background-color: var(--clr_accent2);
    border: 1px solid black;
    border-radius: 5px;
    text-decoration: none;
}
.blog-pager-older-link:hover, .blog-pager-newer-link:hover, .home-link:hover {
    color: white;
    text-decoration: none;
    background-color: var(--clr_accent2);
}
.blog-pager-older-link:visited, .blog-pager-newer-link:visited, .home-link:visited {
    color: white;
    text-decoration: none;
    background-color: var(--clr_accent2);
}

.blog-feeds {
    display: none;
}

Updating the Layout

All of the styles above belong in the <b:skin> section. The other thing we'll do is customize the blogger layout. Previously we had just one column Now we'll set up two columns.

The following replaces the <body> in the minimalist layout:

<body>
  <div id="main" >

    <!-- Header -->
    <header>
      
      <!-- logo -->
      <b:section id='MainSiteBlock' class='siteblock' maxwidgets='1' showaddelement='no'>
      <b:widget id='Header1' locked='true' type='Header'>
        <b:includable id='main'>
          <h1>
            <a expr:href='data:blog.homepageUrl'>
              <data:title/>
            </a>
          </h1>
          <h2>
              <data:description/>
          </h2>
        </b:includable>
      </b:widget>
      </b:section>

      <!-- main navigation -->
      <div class="menu">
        <ul>
          <li>
            <a href="http://google.com">Google</a>
          </li>
          <li>
            <a href="http://ofthat.com">Of That</a>
          </li>
        </ul>
      </div>
    </header>

    <div id='columns'>
      <b:section id='column1' class='maincolumn' showaddelement='yes'>
        <b:widget id='Blog1' locked='true' title='Blog Posts' type='Blog'/>
      </b:section>
      <b:section id='column2' class='sidebar' showaddelement='yes'>
        <b:widget id='BlogArchive1' locked='false' title='Archive' type='BlogArchive'/>
      </b:section>
    </div>
    
  </div>
</body>

This layout has a header with a logo (title) on the left and navigation on the right. That's followed by the columns wrapper with two columns in it.

Each column has one <b:section> element which is where you can place Blogger widgets. Setting "showaddelement" to "yes" lets you use the Blogger Layout settings to add, customize, and remove widgets in those sections.

Applying and Maintaining the Layout.

Save these changes in the Layout section of your blog setup and you'll see a two-column layout much like the one I use on this blog itself.

Most likely, you'll want to make some additional changes. So, you go back to Layout in your blog setup. Because you're using a custom layout, you can no longer use the layout designer. Instead you have to use "Edit HTML". When you click on that you find that it's back to more than 2000 lines of XML/HTML hybrid. Don't panic, Blogger simply filled in the default settings for your "Blog" widget. You can use search to find the elements that you changed including the styles in the <b:skin> element, and the "column1" and "column2" identifiers for the layout. You may choose to add more comments as keys to know where to make future changes.

Adding and Configuring Widgets

The custom Theme described here, plus any additional customizations you may make (e.g. three columns), is fully compatible with the Layout editor on blogger. Use the layout editor to add and configure widgets (also called "gadgets") such as Subscribe, About, Links, and more.

Blogger Reference

The various elements in the blogger layout are documented in the following sections. This will help you get into more advanced stuff:

Monday, July 2, 2018

Simple, Clean, and Functional CSS Styles

When other images fail, bollards will do.

Let's say you're creating a website and you need a really good CSS stylesheet for it. There are a ton of attractive styles out there; many of them created by talented designers. But, from a programmer's perspective they have frequent problems. They are hard to use, not accessible, or faul to adapt to mobile form factors.

That's where I was a couple of months ago. I looked around to find a good example to start from. I found lots of articles about good principles but no simple basis for my css. So, I'm offering the stylesheets I created. They may not be the very best in the world. But they look good, are well designed, easy to use, and license free.

Here's a summary of what makes up a good CSS Stylesheet:

  • The theme should be simple and functional.
  • The look shouldn't be so dramatic that it draws attention away from the content or interferes with communicating the message.
  • Web conventions should be retained such as underlined links and menu mouseovers.
  • Contemporary elements like appealing fonts, gradient shading, appropriate use of color and stylish borders should be used to give the site a modern look.
  • The stylesheet should use semantic HTML tags the way they are intended to be used. This includes elements like <header>, <footer> <article>, a full set of section headers (<h1>, <h2> etc.), <blockquote>, and <code>. Semantically named classes should be provided for elements that don't exist in HTML like .title and .subtitle. Not only to these elements make the stylesheet easier to use, but they also assist in search engine optimization (SEO).
  • The style should facilitate making accessible pages: The order of elements in the HTML should follow the reading order (for screen readers). Foreground and background should have high contrast for readability, measurements such as font sizes, margins, padding, positions, should all be in "em" units thereby making elements grow proportionally and text flow properly when users zoom in and out.
  • The layout should be responsive — looking good on everything from an extreme widescreen monitor to a mini tablet or phone.
  • The stylesheet should be self-contained — not relying on bitmap graphics though it can use web fonts if they are publicly available and hosted (thanks, Google).

The result is in FileMeta/WebStyles on GitHub.

The first thing I did was create a sample web page that exercises every element I needed to include in the stylesheet. I added some JavaScript so that you can switch styles on-the-fly. This enabled A/B testing so that I could quickly compare the results of a change with the original.

I declared common elements such as colors and gradients in variables so that they can be changed in one place. It was tempting to use an enhanced stylesheet format like [SCSS](https://en.wikipedia.org/wiki/Sass_(stylesheet_language) but I found that my style goals are simple enough not to need it.

I refined my work into four stylesheets named after my websites on which they are used: "FileMeta", "BrandtRedd", "EdMatrix", and "OfThat". To these I added "Block", which is an interesting variant on "FileMeta", and "BrowserDefault" which is a blank stylesheet showing you what the page would look like with the default browser styles.

All but "OfThat" look pretty similar to each other. I was deliberately seeking a common theme. That's not a limitation of this approach. You can use one of these as-is but I recommend making it your own. Change a font, use a different color scheme, add a columnar layout, whatever. The stylesheets include lots of comments to help guide you and the web is full of tutorials on how to use CSS. I'm partial to W3Schools.

I'm dedicating this to the public domain using the "unlicense". Use or adapt it however you want. There are no legal restrictions, and no requirement that you need to credit me. However, a mention of FileMeta or one of my other sites would be appreciated if you have an appropriate opportunity.

Happy styling!

Wednesday, May 30, 2018

What is FileMeta About?

FileMeta Logo

We like to take pictures. My family's collection exceeds 100,000 digital photos. Over the years, we have successfully arranged them into a hierarchy of folders organized chronologically by year, month, and event. But the process is getting more challenging as we integrate photos from multiple cameras and phones plus pictures sent to us by others.

This got me pondering, "What is the identity of a file, and what should it be?"

On existing computer systems, the identity of a file is its filename; or, more correctly, its file path including all of the folders that lead up to the file. But there are serious problems with this:

  • Filenames, and especially paths, are ephemeral. They change as we move files around.
  • Hierarchical file systems make sense to librarians and computer scientists, but regular people have an easier time understanding hashtags.
  • A filename is a property of where it's stored, not inherent in the file itself.

I propose that the identity of a file should be its metadata - the title, author, keywords, creation date, and so forth. And the metadata could be stored internally to the file so that it is preserved and carried along when the file is copied or transmitted. Thus the identity is inherent in the file, not in the place it's stored.

Conveniently, most contemporary file formats have a way to store the metadata. This includes Microsoft Office documents, PDF files, media files like MP3 and MP4 and many more. Current versions of Windows and Mac OS even retrieve and index the information.

The foundation has been laid for a metadata-centric way of managing our personal collections of pictures, documents, music, and video. What we lack are well-structured tools for tagging and retrieving. That will be the subject of this blog and the associated FileMeta Project.

I'm starting with better ways of managing our collection of photos. Instead of a single, chronological, dimension; metadata will let us organize the photos by subject, theme, people appearing in the photos, location, and subject. The experimental tools I write are open source and managed on GitHub. And when some are polished well enough I'll post them in the Microsoft store.