Photo Forum / General Photo Topics / UK Photography / August 2005
Indexing and searching huge volumes of images????
|
|
Thread rating:  |
Umgall - 01 Aug 2005 14:30 GMT I know that someone out there can help me with a problem.
I'm being asked to find some software which can index and allow searches on huge volumes of images. Most of these images will be TIFFs, and to be honest, I'm expecting there to be about 1.5 million at the end of the project. Ouch.
Basically I need to be able to store 'metadata' against each image, and to search this metadata very quickly. Ideally, the metadata would be stored in an SQL database, and would provide hyperlinks to images on the file system. I need to have a description (up to 2k of text), a date and a location.
So, I could search for "hyde park" and if this phrase occurs within the metadata fields of any of the 1.5 million images, the hits would be displayed (along with the metadata) and I could click through to the image.
Does anyone know of any system that can do what I need? Any help would be gratefully appreciated. The alternative is to develop an application, but if there is an off-the-shelf solution then this obviously going to be better!
Umgall.
Michael Cargill - 01 Aug 2005 15:35 GMT > I know that someone out there can help me with a problem. > > I'm being asked to find some software which can index and allow searches on > huge volumes of images. Most of these images will be TIFFs, and to be > honest, I'm expecting there to be about 1.5 million at the end of the > project. Ouch. I can't answer the question but you might want to also ask this on some more technical newsgroups - perhaps something like alt.comp.databases...?
Gordon Hudson - 01 Aug 2005 16:21 GMT >> I know that someone out there can help me with a problem. >> [quoted text clipped - 7 lines] > more > technical newsgroups - perhaps something like alt.comp.databases...? You need to work out what you want to get out of the database in the end. This will set what informaton you need to record and the way it is recorded. Then you need to look at how you are going to access the data and how many people will need access to it and when. This will then affect what database system you end up using. You also need to look and see if there is a commercially available system as buying it may be cheaper than developing your own.
It all boils down to the use it will be put to. If it was for access from multiple locations by multiple users I would be looking at Oracle or MSSQL. If its just one person sitting at a computer I would probably not risk MS Access as I don't know how it handles such large databases. You would need to get some advive on that from people who have run big databases.
Bandicoot - 04 Aug 2005 19:03 GMT "Gordon Hudson" <gordon@usenet3.hostroute.co.uk> wrote in message news:42ee3e09$0$38038 [SNIP]
> If its just one person sitting at a computer I would > probably not risk MS Access as I don't know how > it handles such large databases. You would need > to get some advive on that from people who have > run big databases. Last time I had anything to do with this sort of thing we wouldn't put a base that size in Access. Even at 250,000 items we'd prefer SQL. Big databases we mostly used SQL or a Terradata solution (sometimes with a Natural Language front end).
These were not usually on PC architectures, though they would be accessed via PCs: Auspex was a good choice if not putting them on the mainframes. One DB that was used by only about four people, but which was big, went on an old SGI box attached to the office network, which worked well and was cheap.
Peter
Willy Eckerslyke - 01 Aug 2005 16:26 GMT > I'm being asked to find some software which can index and allow searches on > huge volumes of images. Most of these images will be TIFFs, and to be > honest, I'm expecting there to be about 1.5 million at the end of the > project. Ouch. Have a look at ThumbsPlus: http://www.cerious.com/image-database.shtml There's a mention of sql in there somewhere.
> Basically I need to be able to store 'metadata' against each image, and to > search this metadata very quickly. Ideally, the metadata would be stored in > an SQL database, and would provide hyperlinks to images on the file system. > I need to have a description (up to 2k of text), a date and a location. I'm in the middle of setting up a database driven website along similar lines (though I doubt if it's ever get past tens of thousands of images!) using mysql and php. Basic searches are pretty straightforward, but of course I want it to do more - show categories which can be clicked on to refine the search to fewer and fewer images, similar to Ebay's searching system. I should have it finished in a while...
Roger Whitehead - 01 Aug 2005 16:41 GMT > Basically I need to be able to store 'metadata' against each image The IPTC standard was created for this very purpose - see http://www.peterkrogh.com/Pages/digital/iptc.html. What you need therefore is some software that will let you add it to an image and then some (perhaps the same) that will let you search it all.
There are several products that will do with or both and they need not be expensive. IrfanView, for example, lets you add IPTC data - and that's free. This product (which Ive not tried) does both jobs - http://peccatte.karefil.com/Kalimages/EN/Index.html . Another, possiblly more robust, is here - http://www.camerabits.com/pages/PM4.html .
Perhaps other people here know of some. Most pro photographers need something like it. There's certainly no need to reinvent the wheel and start messing around with SQL.
 Signature Roger
Willy Eckerslyke - 02 Aug 2005 10:01 GMT > Perhaps other people here know of some. Most pro photographers need > something like it. There's certainly no need to reinvent the wheel and > start messing around with SQL. I agree in principle, but as the OP refered to millions of images, there's going to be a massive investment in time just inputting the data. In comparison, a few days spent messing with SQL to get something that does this specific job perfectly, and nothing else, could be time well spent. Fine if an off-the-shelf product will work with no compromises, but if that product doesn't quite fit the requirements or is a bit clunky in its application - even if it just means an extra mouse click or two - any small irritation multiplied by 1.5 million is likely to end up as a major headache.
Roger Whitehead - 02 Aug 2005 10:45 GMT > I agree in principle, but as the OP refered to millions of images, > there's going to be a massive investment in time just inputting the > data. In comparison, a few days spent messing with SQL to get something > that does this specific job perfectly, and nothing else, could be time > well spent. How is that going to speed data inputting? If it doesn't exist in machine-readble form (and Umgall hasn't said it does), entering it to a database form is going to be no quicker than entering to a purpose-made product, possibly the reverse.
 Signature Roger
Willy Eckerslyke - 02 Aug 2005 11:51 GMT >>I agree in principle, but as the OP refered to millions of images, >>there's going to be a massive investment in time just inputting the >>data. In comparison, a few days spent messing with SQL to get something >>that does this specific job perfectly, and nothing else, could be time >>well spent.
> How is that going to speed data inputting? If it doesn't exist in > machine-readble form (and Umgall hasn't said it does), entering it to a > database form is going to be no quicker than entering to a purpose-made > product, possibly the reverse. You don't access the database directly, you write your own form in PHP that only asks what you want it to and only shows the fields you need. So instead of a page full of text fields, you may only have one or two and a submit button. If a field only ever needs to contain one of a choice of text strings, you can set up your form so that you click on a radio button to choose one from a list, rather than having to type it in afresh every time. If you want, you can tell it to pre-fill the form fields with the last image's data for you to edit rather than start afresh for every image. Also you could set it up to bulk fill certain fields if you want it to.
With a little thought, your input form should be _the_ most efficient way of inputting data. No purpose-made product could ever be as streamlined.
Roger Whitehead - 02 Aug 2005 13:19 GMT > > entering it to a > > database form is going to be no quicker than entering to a purpose-made > > product, possibly the reverse. > > You don't access the database directly, you write your own form in PHP You're splitting hairs now.
> With a little thought, your input form should be _the_ most efficient > way of inputting data. No purpose-made product could ever be as streamlined. Unless one has looked at all the significant products, one cannot know. A sensible buying process would be to do this first, then look into a roll-your-own answer once one has a basis for comparison.
 Signature Roger
Willy Eckerslyke - 02 Aug 2005 15:09 GMT >>You don't access the database directly, you write your own form in PHP
> You're splitting hairs now. Hardly. That's fundamental to the whole thing.
>>With a little thought, your input form should be _the_ most efficient >>way of inputting data. No purpose-made product could ever be as [quoted text clipped - 3 lines] > sensible buying process would be to do this first, then look into a > roll-your-own answer once one has a basis for comparison. I have difficulty remembering so far back but I thought that was pretty much what I suggested in the first place, hence my link to cerious.com.
Roger Whitehead - 02 Aug 2005 15:41 GMT > > You're splitting hairs now. > > Hardly. That's fundamental to the whole thing. Life's too short to nail your feet to the floor so I'll stop bothering.
> I have difficulty remembering so far back but I thought that was pretty > much what I suggested in the first place, Your memory clearly is failing. You suggested one product, not a survey of them.
 Signature Roger
Willy Eckerslyke - 02 Aug 2005 15:54 GMT > Life's too short to nail your feet to the floor so I'll stop bothering. Still have to have the last word though, eh?
>>I have difficulty remembering so far back but I thought that was pretty >>much what I suggested in the first place, > > Your memory clearly is failing. You suggested one product, not a survey of > them. Any idea what I had for tea yesterday? I'm trying to decide whether I need to shop on the way home.
Umgall - 02 Aug 2005 11:57 GMT >> I agree in principle, but as the OP refered to millions of images, >> there's going to be a massive investment in time just inputting the [quoted text clipped - 6 lines] > database form is going to be no quicker than entering to a purpose-made > product, possibly the reverse. I suppose to be fair, the 'metadata' will exist in machine readable form. This will be generated from an existing database, and if the application supports it, will be imported in XML. There aren't many fields, but it is vital that these can be searched: County, Date, Description, Surname, Forename, Placename and image ID.
Willy is right - due to the huge volumes, it's important to get something which is flexible to allow us to search quickly and return matches, then to display the image with one keyclick. Browsing the images is imporant too, but fast search capabilities are vital.
Thanks for the suggestions so far!
Umgall.
Neil Barker - 02 Aug 2005 12:37 GMT > I suppose to be fair, the 'metadata' will exist in machine readable form. > This will be generated from an existing database, and if the application > supports it, will be imported in XML. There aren't many fields, but it is > vital that these can be searched: County, Date, Description, Surname, > Forename, Placename and image ID. I tell you - you want Fotostation Pro - does all that straight out of the box :-)
 Signature Neil Barker
Phil Kyle - 20 Aug 2005 16:43 GMT >> I suppose to be fair, the 'metadata' will exist in machine readable >> form. This will be generated from an existing database, and if the [quoted text clipped - 4 lines] > I tell you - you want Fotostation Pro - does all that straight out of > the box :-) Closest you've ever been to a box.
 Signature Phil Kyle™ Uno Dos Tres Cuatro CINCO!!!!!!
"Be very aware that my willingness to continue to criticise your sig is infinite." -- Neil Barker
ah - 21 Aug 2005 01:57 GMT >>> I suppose to be fair, the 'metadata' will exist in machine readable >>> form. This will be generated from an existing database, and if the [quoted text clipped - 6 lines] > > Closest you've ever been to a box. Is that a euphamism?
 Signature ah fait loucher un bon oeil
Neil Barker - 01 Aug 2005 17:42 GMT > I know that someone out there can help me with a problem. > [quoted text clipped - 7 lines] > an SQL database, and would provide hyperlinks to images on the file system. > I need to have a description (up to 2k of text), a date and a location. Yup, no problemo.
Have a look at Fotostation Pro and Index Manager.
http://www.fotoware.com
Fotostation Pro is the front-end application, which works as a standalone image cataloguer / editor, but really comes into its own when connected to a server running Index Manager.
Essentially what happens is this:-
When an image is sent to the server from Fotostation Pro, Index Manager reads the data contained in the IPTC fields and adds it to an index, with a pointer to that image file location for later retrieval.
When using the search facility in Fotostation, rather than having to search through thousands of files, all it needs to do is to consult the master index - any matches can then be found in seconds.
You'll find that many newspapers, mine included, run this system and it does work extremely well. We currently have just under 100,000 images online and searching on a keyword or phrase takes literally a few seconds. Index Manager has the capacity to search millions of images, potentially spread over several servers using something called "Cluster Commander" (which enables many servers to be treated effectively as one big one). It can also do Boolean algebra searches using AND/OR/NOT together with phonetic searches and more.
It can also be connected to a WWW front-end, which is a Java application enabling online viewing/ordering etc.
If you need further help with this, feel free to get in touch.
 Signature Neil Barker
Phil Kyle - 20 Aug 2005 16:43 GMT >> I know that someone out there can help me with a problem. >> [quoted text clipped - 42 lines] > > If you need further help with this, feel free to get in touch. He means that literally.
 Signature Phil Kyle™ Uno Dos Tres Cuatro CINCO!!!!!!
"Be very aware that my willingness to continue to criticise your sig is infinite." -- Neil Barker
ah - 21 Aug 2005 01:56 GMT >>> I know that someone out there can help me with a problem. >>> [quoted text clipped - 44 lines] > > He means that literally. Oooohhhh..
 Signature ah fait loucher un bon oeil
infinity - 02 Aug 2005 00:20 GMT > I know that someone out there can help me with a problem. > [quoted text clipped - 6 lines] > and to search this metadata very quickly. Ideally, the metadata would > be stored in an SQL database, Have a look at Thumbsplus 7 by Cerious, which uses the Access database format, although it functions as a standalone application. You can have keywords, user defined fields that take numeric or string values, and also add lengthy comments to images. The thumbnails view can show all your own fields & keywords plus EXIF data etc, and info embedded in the file can be used to generate keywords if you like, as can its name and folder path. There's a 30 day free trial available. Since the database is now Access format, you should be able to open it directly if you need more functionality and use your own search macros. I'm not sure how well it copes with millions of images but certainly tens of thousands is no problemo.
|
|
|