Wikiposts
Search
Computer/Internet Issues & Troubleshooting Anyone with questions about the terribly complex world of computers or the internet should try here. NOT FOR REPORTING ISSUES WITH PPRuNe FORUMS! Please use the subforum "PPRuNe Problems or Queries."

MySQL SELECT help please...

Thread Tools
 
Search this Thread
 
Old 11th Jun 2017, 14:41
  #1 (permalink)  
Thread Starter
 
Join Date: Feb 2006
Location: UK
Posts: 669
Likes: 0
Received 0 Likes on 0 Posts
MySQL SELECT help please...

I have a large MySQL database containing keywords or phrases relating to pre 1980 aviation topics.
Searching this is quite literal and very hit and miss so I'd appreciate some help in creating a suitable SELECT statement with perhaps some regular expressions to eliminate the more usual problems. (currently just LIKE %%)
Typically the database contains designations according to the source material. Users can't be expected to enter precise terms and manufacturers blurb doesn`t necessarily remain consistent over time. Hyphens spaces and other marks appear and disappear regularly.

Some illustrative examples:
Rolls-Royce Rolls Royce (hyphen omitted in search box - search fails)

C130 C 130 (space in search box - search fails)

Shorts SD3-30 (SD330 Shorts 330 Short SD 330 Shorts 3 30 searches all fail)
Mark 2 Mk II (Mark II Mark 11 Mk 2 Mk 11 M.K. II etc may not work)

Lockheed L-1011 TriStar (Lockheed L1011 L-1011 Tri Star may not work)

BOAC (B.O.A.C fails)

I'm sure you can imagine dozens more along similar lines
In short I think most of the problems would be resolved if the search was less touchy about spaces & punctuation marks . ,

Thanks
windriver is offline  
Old 12th Jun 2017, 01:20
  #2 (permalink)  
 
Join Date: Dec 2013
Location: Norfolk
Age: 67
Posts: 1
Likes: 0
Received 0 Likes on 0 Posts
windriver

This is precisely the reason that large organisations employ quite a few people to check and correct data entry errors and some experts to extract data that is known to be on the system, but cannot be retrieved.

The principle behind database entry methods is consistency. Either the data must be entered in a consistent manner by trained personnel or data is selected from a fixed set of look up tables. A combination of these two methods are used for databases that have to cope with variable data.

You can make searches less specific by searching for just "Rolls" rather than "Rolls Royce" or "Rolls-Royce". Ignoring the capitalisation would also retrieve records that mention, for example, "the aircraft rolls to the right".

Databases have to be sensitive to punctuation and spelling, they are designed to retrieve exactly what is asked for, and only that. That is why it is important to consider consistency of data entry and error checking and detection during data entry when first designing a database.

It is a futile exercise to try and redesign the database after several thousand records have been entered in an ad hoc fashion. each and every record will need to be revisited and edited to conform to whatever standards you decide to apply. The larger the database, the more important it becomes for the integrity of data entry to be maintained.

Clearly your database contains ad hoc information that has been entered in accordance with whatever source it originated from. That is fine. If you need to search for information in that field, variable string searches are the only way of retrieving the records.

It may be of benefit though to ensure that each data record has a consistent set of fields such as date, make, model, registration, and no doubt several others you find of importance. This will at least allow you to retrieve a limited set of records to be examined individually or searched more specifically using a string search when trying to retrieve particular details.

I'm sorry to have to say this, but these design decisions need to be made before you start building a database, not once it is up and running. Then it is too late to make major changes without having to reenter data or migrate the data across to a new system. Both of these solutions involve a lot of work.

Back in the day when Windows was a new innovation on computers, I designed a bespoke database that contained half a dozen free text entry fields along with validated data entry fields that only allowed entries from a pull down list. The system remained in use for ten years at various sites before being overtaken by newer technology and a single networked database system, so I know the pitfalls that await the unwary.

Either redesign the database with validated data fields and look up tables, or go through each record and apply an editing standard to free text records. If, for example, a name can appear spelled in one of several ways, e.g. with or without spaces, or with a hyphen or apostrophe, then adopt a standard such as adding the name at the end of the data field spelled all in capitals with no spaces.

Similarly inserting markers such as capitalised double or triple letter groups to designate certain types of records can be useful, E.g. JJ, QQ, YYY.

Try to standardise the data, then you can use standardised searches and expect to be successful in quickly finding what you are looking for most of the time.

Database design is something of a black art. You need to consider all the possibilities and likelyhood of needing extra data entry fields before you even begin.

Whichever path you chose, you have a lot of work ahead. Good luck.
G0ULI is offline  
Old 12th Jun 2017, 08:07
  #3 (permalink)  
 
Join Date: Apr 2010
Location: London
Posts: 7,072
Likes: 0
Received 0 Likes on 0 Posts
Absolutely correct - just think of Macdonald, Mac Donald, mcDonald, Mc Donald........

You can decide on a standad useage and do a search and replace on the old items but that can cause a whole new set of problems. If it's really big it can take years............
Heathrow Harry is offline  
Old 12th Jun 2017, 08:16
  #4 (permalink)  
Thread Starter
 
Join Date: Feb 2006
Location: UK
Posts: 669
Likes: 0
Received 0 Likes on 0 Posts
Databases have to be sensitive to punctuation and spelling, they are designed to retrieve exactly what is asked for, and only that. That is why it is important to consider consistency of data entry and error checking and detection during data entry when first designing a database.

Agreed, but my data (or 95% of it) is correct as per source material (and my data entry policy) - but I have no control over how users weaned on Google phrase their searches.


The more savvy users do as you suggest and believe it or not some even read the notes it's those that don`t I'm trying to capture.
windriver is offline  
Old 12th Jun 2017, 14:39
  #5 (permalink)  
 
Join Date: Apr 2010
Location: London
Posts: 7,072
Likes: 0
Received 0 Likes on 0 Posts
TBH the Google search engine is amazing and is a default for 99% of computer users these days

They EXPECT to be able to type in something close and get an answer - Unfortuately SQL etc are a lot older and far less flexible

Not much you can do TBH - the data is right, you've got a data entry policy, you correct mistakes when you find them

Unless and until you can deploy some software that can carry out an intelligent trawl and recategorise the whole thing you've done what you can
Heathrow Harry is offline  
Old 12th Jun 2017, 15:00
  #6 (permalink)  
Thread Starter
 
Join Date: Feb 2006
Location: UK
Posts: 669
Likes: 0
Received 0 Likes on 0 Posts
Unless and until you can deploy some software that can carry out an intelligent trawl and recategorise the whole thing you've done what you can

Thanks, good points... from analysing the nature of failed searches I think I can get as close as I need to by brushing up on pattern matching and regular expressions and testing a few options on a local server. (MySQL manual helpful here)


The data stays 'as is' though...
windriver is offline  
Old 13th Jun 2017, 16:53
  #7 (permalink)  
 
Join Date: Apr 2010
Location: London
Posts: 7,072
Likes: 0
Received 0 Likes on 0 Posts
Absolutely!! Don't mess with the data!!!!!!!!!!!!!!!!!!!

I've been working on and off with some gusy who are developing software which can be used to go through a zillion scanned documents and images and come up with a flexible tagged index - but it is very specialised field of endeavour and the tags are limited - AND it requires a lot of "training"

they're getting there but it's a slow process
Heathrow Harry is offline  
Old 13th Jun 2017, 18:30
  #8 (permalink)  
Thread Starter
 
Join Date: Feb 2006
Location: UK
Posts: 669
Likes: 0
Received 0 Likes on 0 Posts
With a little help from a friend I've improved the search to as far as I intend to whilst retaining a single search box format.


For interest? - I'm developing the database offline through some custom software and uploading to the web at intervals. This allows me to make global changes easily if necessary but more usefully I have a spare field to insert relevant keywords where required.


The data was fairly straightforward to compile before 1961 when the industry consolidated and mergers and co-operations became the norm and well known names spilt into many different divisions. (Avro 748 HS 748 BAe 748 etc )


I'll have the data set complete by the end of August and then change the main emphasis from search to a classified directory.
windriver is offline  

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off



Contact Us - Archive - Advertising - Cookie Policy - Privacy Statement - Terms of Service

Copyright © 2024 MH Sub I, LLC dba Internet Brands. All rights reserved. Use of this site indicates your consent to the Terms of Use.