ECIM 2007: Texas vs California

Posted on September 7, 2007
Filed Under Events |

Vasily Borisov and Chris Hughes aka Christ Huges jointly made this talk in Haugesund at ECIM conference 2007 September the 11th at the workshop “E&P Information Content Management“. Session supervised by Nina B. Knudsen, Exxonnmobil.

Abstract

texas-county-map.gif

 

californiastatecountymap.jpg

 

Black, oily, industrial, complex, for professionals only: wellbores, logs and deviation surveys, seismic velocities, navigation…

Easy-going, funny, user-friendly, entertaining, creative: photos, films, music, blogs & wiki…

Somehow I can’t help seeing a gap of Grand Canyon proportions between these two states. And I’m struggling to understand why? Assuming we like what we do - what’s wrong with our content? why does the process of finding our data hurt? How can we make discovering and using it a more pleasurable experience? How can we put a bit of Californication in our Texas?

Hello, I’m Chris Hughes, Managing Director of Kestrel IDM. We are a data management service company specialising in the physical and digital management and delivery of E&P data. We handle everything from the physical storage to the electronic delivery of legacy data items for our customers, and the processes of moving and converting the data from the physical to digital form. This brings us into regular contact with a vast array of legacy media and documents, all of which need to be better classified and made more accessible once digitised, and it’s the management of this Content and the tools we are using to make this process more efficient that we will be talking about today.

Intro

The content, or rather lack of it present with most scanned data management projects, mean that it is generally hard to manage and therefore it’s always on the verge of not being used at all. How can we improve the content and therefore make our data far more effective.

The first step on the road to achieving this is to realise that we like our content. And we like it because it makes sense to us - it reveals information to us.

It makes sense to us because of our knowledge and involvment in the industry.

We know the context of the data, the reasons why this data exists and has been preserved, the geo-spatial context in which it lives, the historic context by which the data came to be acquired, and the methods by which it will come to retrieved and used. We therefore have understanding of the data before us. It is the preservation and presentation of these contexts we want to reveal and make available for our customers.

We look at our content and we better understand the logic of the exploration process, and can study the history of the decision process. We know our Operators, we recognise the logging and seismic acquisition Contractors, we undestand their individual styles, we remember how the business has changed through the years.

We want to never never loose our data, and we want our every observation within the content to be preserved and serve it’s purpose both to us now and to others in the future. Thus from the discovery of data and acquired knowledge comes the joy of better business decisions.

It even makes us happy or sad - we find it satisfying when it’s high quality and we can make sense of it but disappoints us if when links are broken, badly formatted, misleading, unreferenced etc.

We want to share this enthusiam for content with our customers and for them to experience the same joy of discovery!

We want our customers to not only be comfortable and fluent with their content but also as interested in it as we are because interest generates activity, which builds knowledge, and drives our business further forward.

One thing depends on the other. We noticed it as kids in activities like sport or playing music, the better you learn to do something the more interesting and exciting it is. Similarly, the more dedication you show towards it, the better you get at it.

This all sounds good. But what we’re still a long way off were we need to be. Right now we have millions of scanned images of different nature. Seismic sections, observer logs, well reports, well logs and so on. It’s pretty hard to get either fluent or excited about this mess. Frustration is more usually the feeling we experience at this point.

To solve this we at Kestrel have looked to implement a system, methodology and work flow that would enable us to better index and provide accurate content to our customers.

The main requirement would be a system that will allow us to maintain our positive attitude towards our content and to share it with our customers. Following this, it had to work with our production processes in Europe and the US and to fit the exact requirements of our customers in those regions.

Apart from those highly advanced requirements we also wanted some fairly obvious features as well:

And here is what we got from Kadme…

A system for us

Load? There is not loading.

spoon2.jpg

We don’t need to load anything. K-search system spiders our directories with scanned content and index the files. At this point if we have meaningful metadata encoded into directory structure - we can inject some semantics for once saying: this directory name means that and this directory name means the other thing. Those meanings would be indexed as well but it is strongly not necessary since directory path is preserved anyway.

Even if we don’t have the files physically but only have a listing or a catalogue in Excel or CSV format e.g. K-search can be tuned to read it out entry by entry at treat them as offline content until files will show up.

What we got in the end is a search engine that finds our files. That’s helpful but not much since the files have little or no metadata. Sometimes filename is all we have. Again filename might have the semantics but very rarely something consistent can be extracted. So…

Go Capture!

capture-775108.jpg

We need to capture metadata to make sense out of our content. Everyone with a role allowing to create or edit metadata can click on the file and capture the metadata about it. It is very easy.

We had a schema we used before to catalogue our content. Kadme said ok, and built this schema into K-search. Apparently it is very easy. They can have many different schemes for capturing but have an internal one as well into which they copy the semantics of captured information. Not because they believe that their schema is better then others, but because using it they can provide same functionality for the content described in different terms and schemas.

Capturing itself happens in different shapes. Application suggests a predefined Object type (Well Report e.g.) and suggests a set of properties according to selected schema. Application allows to create a new type if none of the existing types fits and fill it in with properties to be captured. Unstructured information can be captured as well in a shape of Description if it is extensive, or tags if it is a short keyword. Tags can be also inflicted on a group of files simultaneously.

The goal is to make capturing as easy and handy as possible and lower the capturing barrier.

One more important thing - K-search does not allow us to mix apples with herring. We can’t call a file - a well log. Because it is not. Well Log is a well log. File is a file. Instead it allows us to create logical entity based on a file and Call It “Well Log” or “A Series of Unfortunate Events” if we like so, and refers to the originating file. File stays a file, Well log stays a well log but they are connected.

Now it is much better. Our K-search searcher starts finding us well logs, not only files. It shows all Kadme schema properties by default, or our schema properties if we want to. The interface is fully configurable.

But not too many since firstly we are busy to capture metadata secondly it is too expensive to have someone in UK or Norway to do it full time.

Outsource

software-outsourcing-cartoon-1.jpg

So we outsource the streamlined metadata capturing process to our office in India. K-search system allows us to do so. We have secure logon capabilities, the system is available on the net so that our workers in Deli can logon and do the capturing.

You can look but you can’t touch

demimoore.jpg

You’ve probably noticed one flaw in the speech so far. We can capture everything but how do we see the files? Download them? What if they are big? How secure is opening them for download? How efficient? How big network bandwidth is required. What viewers do we have on our local laptops?

Kadme provides piece of functionality that answers all those questions and which was one of the primary reasons we went for Kadme software. It is called K-view. It allows us to preview files from K-search without downloading them. It allows us to work through channel of any bandwidth, it is secure because does not allow do download the actual file. It does not require good local viewers of PCs in India, (which can be quite problematic for formats like TIFF e.g.) and it is perfect for metadata capturing and a very good fit our outsourcing model.

The big Q for Quality

q.jpg

The metadata quality is being very sufficiently addressed.

Context 1. Business you don’t want to miss or Asset footprint

business.png

Different schemas, tag, etc and you were talking about the context? Where is the context. Well here is the answer. One most important context for our content is probably “business”. Business impersonated in a set of Assets. We deal mostly with raw data so, how many % of our content relates to a Well or a Field or a License or a Survey? Well - 100 % would be a good guess. What K-search brings in that respect could be a system in it’s own right. It carries the index of publicly available information form places like www.npd.no, www.ukdeal.co.uk and www.geus.dk. The central part of this information are assets: Wellbores, Surveys, Fields, Licenses… according to the standards of country that officially mainteins the data set.

Now all the content we capture get connected to those assets since it does not allow us to invent wells, and mistype field names but gives us a dictionary that helps to choose the correct asset of reference.

And now suddenly our customers become very interested since the K-search is now providing them with a Business context. If you want to know all you know about the well - you type the name of the well or select the name of the well from the dictionary and get everything that has ever been filed for the well both within content we maintain and publicly maintained content. So what they got is a full information footprint of an Asset. That’s worth a lot.

Ha-ha, a big deal. Fire up google and search within the site www.npd.no or www.geus.dk and you’ll find what you need. Well not so fast. All you’ll find would be web pages that you have to further manually explore for the content. Try to find jurassic correlation chart for well 2/7-6 on NPD?

What you’ll get is : 265_02_NPD_Paper_No.30_Early_Tertiary-Late_Jurassic_Correlation_chart_Eldfisk__Well_2_7_6.pdf

There is no such search engine that would convert 2_7_6 into 2/-6 to find it for you. K-search does. Not even that. It converts 265 into 2/7-6 since it knows that 265 is this wells NPD ID. Think about all those underscores, leading zeros, slashes and dashes that stand between you and your content that no general purpose search engine can overcome. K-search knows them all for the NorthSea area. That is helpful if not to say excellent.

Context 2. Time is money. Depth is oil :)

timeismoney.jpg

Another important context is time. All date and time information is correctly preserved and recognised within metadata. When we do our searches we are allowed to select a date and time range within which we want to search. That is very useful from information management perspective. We can olso export the result set in iCalendar format and import it into our favorite calendar application. Think about something like “Release date for seismic surveys”.

calendar.png

Really useful. What next? It is a bit further down the road from a simple context management application but if we have a depth reference within most of our content elements - why not try to search for it within the depth interval. Set it to 2000<TD<2200 and search for “oil”. That is what we are looking for anyway :)

Context 3. Spatial

map.png

Another fancy bit of functionality is spatial referencing. If you know where the photo has been shot - place it on the map and it sticks to it. In our world it is even easier. All the assets that K-search has the index of have shapes and could be displayed on the map. By the fact of asset referencing all our content ends up being spatial referenced as well.

In addition to the index of public data in the North Sea, Kadme mainteins a geodatabase of all the assets like wells, licenses, surveys, fields and delivers it wrapped into their most well known K-map application that you surely have used before ordering the public data from DISKOS.

Very light and easy to use GIS browser loosely coupled with K-search application. Select an area based on polygon or shape of the field, block or license, select all the assets crossing, press search and get all the content related to those assets! Great.

Equally from K-search all the search results individually or together can be thrown over to the map to see the extent of assets they relate to. E.g. search for Shell, find 50 licenses and through them on map. You’ll gett all licenses operated by Shell. From Search into Map in three clicks.

Workflow management

process.jpg

Almost perfect but we need more. We need to manage workflows. If bad content is identified - someone have to do something with it. Important data arrives - we have to prioritise metadata capturing for it, customer wants some offline content to become online - we need to act, customer wants online content to be delivered on media - we need to act.

In this case K-search allows us to do what no other search engine really allow. It allows us to assemble persistent collections of search results or “carts” and assign a certain action type and person or group of persons with it.

How does it work? Really simple. Customer goes to K-search, looks for some well related information. Finds a set of offline (not digitised documents) collects them into a “cart”. Creates “make online” request. The request passes our engineer that sets a priority to it and assigns in with the group responsible for scanning. They scan the document from the cart and put them on disk. K-search automatically picks them up, indexes them and make them online. Customer gets a notice, goes to K-search and downloads the contents of the cart.

This is a powerful and useful mechanism allowing customers to interact with us and us to interact internally and manage all content related work.

Entitlements

logos.png

And the last but not least - entitlements. All content within the system is entitled to one or many LDAP groups. It means that we can run not only one and the same system for ourselves and a customer but also the same system for all customers. Authorised user can set the entitlements based on the same old “carts” or persistent collections of objects assembled with K-search. As soon as the entitlements are set customer would only search through the content that is either public or entitled to them.

Summary

If we look at the requirements we put together in the beginning:

Agile

09_scrum.jpg

Here on the last point we need to say a couple of words. All the multimedia sites on Internet that we really like are not static but dynamic systems. They are not being maintained in the sense we used to but are being developed as they go with new releases coming with monthly to daily frequency depending on the resource.

We like that and we see it as the best approach to address change management problem. We are a dynamically growing company - our content grows quickly, our customer base growth even quicker. The software should allow us to dynamically address those changes instead of waiting for a new release for half a year.

Kadme provide us with that capability. Using Scrum development management methodology they are able to deliver working product increments on monthly basis.
Something happened to our well logs, Ladies and Gentlemen. It is fun to work with. Easy find and deliver. It grows quickly, gets used and reused often. Looks like we got a bit of California in the middle of our Texas. It feels good anyway.

Comments

Comments are closed.