HomeDevOpsUsing the Haversine Formula in Drupal 7

Using the Haversine Formula in Drupal 7

The Haversine formula is one of the easiest to use pieces of complicated math I’ve had the pleasure to use. If you’re not familiar with it, it’s pretty simple in theory – it’s an extension of the Pythagorean formula from a grid to the surface of a sphere, which basically means that you can use it to measure the distance between two points on a sphere (1).

“But what,” I hear you ask through the secret microphone hidden in your keyboard, “does this have to do with Drupal?”  And the answer is – search.  I’ve worked on a number of projects over the years where users wanted to search for things which were some distance from a central point – a zip code, a city, or the like.  This can be done with apache solr these days, but sometimes apache solr isn’t what you need. Either you’re not doing a keyword search and just filtering your nodes (show me all mexican restaurants near 12th and Vine) or else you don’t think you need the extra complexity of adding an Apache Solr instance to the project.  An index of restaurants isn’t a bad idea for an example, so let’s build one.  In the tradition of Drupal demos, we’ll say we’re creating a restaurant search engine, which we will name ‘Shout’.  So, we spin up a copy of Drupal 7, set up the usual database, add a ‘Restaurant’ node type, download Fivestar* to do ratings, set up a few quick taxonomies (cuisine, price range [low, medium, high], and maybe style [sit-down, food court, fast food]) which we add to the restaurant node type.

Step 1

Create a node type with location information.  To store the address there’s two good options: the Location module, which grew out of CCK, and the Address Field module, which comes from the Commerce module.

Step 2

Add the latitude and longitude to the nodes – if you’re using the location module you can enable storing those values in the same field, but if you’re starting with the address field you need to add a field which stores that information.  I recommend the Geofield module.

Step 3

Finally, you will need to set up geocoding – the process of assigning latitude and longitude based off of an address.  There’s plenty of services which will do this for you, and if you’re using the location module, then you can enable it there.  Alternately you can use the Geocoder module to store these values.

Example

Following along with our Shout example, let’s add the addressfield, geofield, and geocoder modules, which will also in turn require the geoPHP and ctools modules.  Add the Address field and tell it to store a postal address, set up a geofield on the restaurant node as well and set the widget to ‘geocode from another field’, and take a look at the configuration of geocoder in admin/config/content/geocoder.  You can use the google api in batch for free, as long as you don’t get too crazy with the number of requests per day.  This being an example site, I think we’ll be safe, but when doing a commercial site it’s always best to read the Google terms of service,sign up for an API key, and close cover before striking.

I’ve named the Address field field_address, and in a fit of originality I’ve named the geofield field_map_location.  Once I had everything set up, I entered a few local restaurants and ran cron to make sure that I was getting data in the field_data_field_map_location table – I suggest you do the same.  (Well, to be honest, at first I wasn’t getting data, but that’s why we test our examples when writing blog posts.)

Step 4

Once you’ve got locations set up, the next step is your search engine.  For this task I suggest the Search API module, which allows you to define your own indexes, and to switch search engines in the future as the need arrives.  You’ll also need the Search API DB and Search API Pages modules.

Step 5

We’ll start by setting up a service – in this case, just gave it an obvious name and select the Database engine.

Step 6

Then we’ll create an index – although Search API creates a Default node index when it’s enabled, we want one just for restaurant nodes.  So we’ll click on ‘Add Index’, give it a name of ‘Restaurant Index’, select that we want to index Restaurant Nodes, and put in a quick description to remind us of what it is, and select the server we just created.

Step 7

After that, go into index fields for the index and select at least the title and the ‘main body text’ for indexing – I suggest including the address as well.  It’s also important to add the Location field that you’re using, and include the latitude and longitude values in the index.  When you can’t find a field, expand the ‘Add Related Fields’ at the bottom and look for them there, and make sure you save your changes before leaving the page.

Finally, on the filter tab I suggest excluding unpublished nodes, as well as Ignoring case and adding the HTML filter.

With all that setup, use the search_api_pages module to set up a search page for the index you’ve constructed.

With data and index set up, it’s time to add the location filtering.  Let’s add a quick block to filter with:


/**
 * Implements hook_block_info().
 */
function example_block_info() {
  return [
    'location' => [
      'info'  => t('Search Location Filter'),
      'cache' => DRUPAL_CACHE_GLOBAL,
    ],
  ];
}


/**
 * Implements hook_block_view().
 */
function example_block_view($delta) {
  if ($delta == 'location') {
    $block['subject'] = t('Filter by location:');
    $block['content'] = drupal_get_form('example_location_filter_form');
  }
  
  return $block;
}


/**
 * This form allows the user to restrict the search by where they are.
 */
function example_location_filter_form($form, &$form_state) {
  
  $form['center'] = array(
    '#type' => 'textfield',
    '#title' => t('From'),
    '#description' => t('Enter a zip code or city, state (ie, "Denver, CO")'),
    '#maxlength' => 64,
    '#size' => 20,
    '#default_value' => isset($_GET['center']) ? $_GET['center'] : '',
  );
  
  $distances = [5, 10, 20, 50];
  foreach ($distances as $distance) {
    $options[$distance] = t('Within @num miles', array('@num' => $distance));
  }
  $form['radius'] = [
    '#type' => 'radios',
    '#title' => t('Distance:'),
    '#options' => $options,
    '#default_value' => isset($_GET['radius']) ? $_GET['radius'] : 5,
  ];
  
  $form['submit'] = [
    '#type' => 'submit',
    '#value' => t('Filter'),
  ];
  $parameters = drupal_get_query_parameters(NULL, ['q', 'radius', 'center']);
  $form['clear'] = [
    '#type' => 'markup',
    '#markup' => l(t('Clear'), current_path(), array('query' => $parameters)),
  ];
  return $form;
}


/**
 * Validation handler for location filter.
 */
function example_location_filter_form_validate(&$form, &$form_state) {
  if (!empty($form_state['values']['center'])) {
    $location = trim($form_state['values']['center']);
    // Is this a postal code?
    $point = example_location_lookup($location);
    if (empty($point)) {
      form_set_error('center', t('%location is not a valid location - please enter either a postal code or a city, state (like "Denver, CO")', ['%location' => $location]));
    }
  }
}


/**
 * Form submit handler for location filter form.
 */
function example_location_filter_form_submit(&$form, &$form_state) {
  $parameters = drupal_get_query_parameters(NULL, ['q', 'radius', 'center']);
  if (!empty($form_state['values']['center'])) {
    $parameters['radius'] = $form_state['values']['radius'];
    $parameters['center'] = $form_state['values']['center'];
  }


  $form_state['redirect'] = [current_path(), ['query' => $parameters]];
}

In this case, <code>example_location_lookup()</code> looks for a latitude/longitude pair for a given location entered by the user, which I’m leaving as an exercise for the reader in hopes to keep this post short.  It should return an array with the keys ‘lat’ and ‘long’, at least.  For testing, you can have it return a fixed point until you’ve got that setup, like:

function example_location_lookup($location) {
  return array('lat' => 39.7392, 'long' => -104.9903);
}

So, now we can return to the Haversine formula.  Once you’ve got the position entered and passed along, it’s time to match it against our restaurants.  Doing complex math is hard, so after a few moments of thought, we realize that anything more than the radius miles north or south, or east and west, of the center point will be too far away to bother including in the search radius, so we’ll first filter on a range of latitude and longitude around the center, and then filter by the haversine formula to knock out everything outside of the circle.  For implementing the Haversine formula in SQL, I’m indebted to Ollie Jones of Plum Island Media, who does a great job of demystifying the formula here.

/**
 * Implements hook_search_api_db_query_alter().
 */
function example_search_api_db_query_alter(SelectQueryInterface &$db_query, SearchApiQueryInterface $query) {
  $field_name = variable_get('example_location_field_name', 'field_location');


  // Do we have a location?
  if (isset($_GET['center']) && isset($db_query->alterMetaData['search_api_db_fields'][$field_name . ':lat'])) {
    $location = $_GET['center'];
    $radius = isset($_GET['radius']) && is_numeric($_GET['radius']) ? $_GET['radius'] * 1 : 5;
    $point = example_location_lookup($location);
    if (!empty($point)) {
      // Basically, we make a subquery that generates the distance for each adventure, and then restrict the results from that to a bounding box.
      // Then, once that subquery is done, we check each item that survives the bounding box to check that the distance field is less than our radius.
      $latitude_field = $db_query->alterMetaData['search_api_db_fields'][$field_name . ':lat']['column'];
      $longitude_field = $db_query->alterMetaData['search_api_db_fields'][$field_name . ':lon']['column'];
      $table = $db_query->alterMetaData['search_api_db_fields'][$field_name . ':lat']['table'];
      
      $sub_query = db_select($table, 'haversine');
      $sub_query->fields('haversine', ['item_id', $latitude_field, $longitude_field]);
	  // Calculate a distance column for the query that we'll filter on later.
      $sub_query->addExpression("69.0 * DEGREES(ACOS(COS(RADIANS(:p_lat))
                                      * COS(RADIANS($latitude_field))
                                      * COS(RADIANS(:p_long - $longitude_field))
                                      + SIN(RADIANS(:p_lat))
                                      * SIN(RADIANS($latitude_field))))", 'distance', [':p_lat' => $point['lat'], ':p_long' => $point['long']]);
	  // Filter out anything outside of the bounding box.
      $sub_query->condition($latitude_field,  [$point['lat'] - ($radius / 69.0), $point['lat'] + ($radius / 69.0)], 'BETWEEN');
      $sub_query->condition($longitude_field, [$point['long'] - ($radius / 69.0), $point['long'] + ($radius / (69.0 * cos(deg2rad($point['lat']))))], 'BETWEEN');
      $db_query->join($sub_query, 'search_distance', 't.item_id = search_distance.item_id');
      $db_query->condition('search_distance.distance', $radius, '<');
    }
  }
}

And there you go.  In my example, I set up the page as search, and tested with the url: search/diner?radius=500&center=denver, and got back the Denver Diner, but not the New York Diner.

* It’s not certain depending on which version of fivestar you get, but you might need to download the entity api module as well. Just in case you’re following along at home.

(1) We’re just going to ignore the fact that the Earth isn’t a perfect sphere for the purposes of this article – there’s a degree of error that may creep in, but honestly if you’re trying to find locations within 300 miles of a city, there’s already enough error creeping in on the ‘center’ of a city that the close approximation of the Haversine formula is a relief.