## Unusual errors – calculating small distances from geographic coordinates

I’m engaged on a challenge which entails discovering distances between many pairs of coordinate-points (so [x1, y1] – [x2, y2] type of format), and it’s on the census block degree, so we’re speaking fairly small distances. At instances, a single level will likely be evaluated amongst two or extra different factors, and the purpose is to search out the **minimal distance** between all of them, and to mark which of the opposite factors that the preliminary level is closest to. I examined three totally different strategies, and I get annoying inconsistencies.

**First technique**: calculating Euclidean distance (√((x2 – x1)^2 + (y2 – y1)^2 )) utilizing the coordinates themselves. Math and end result are completed/discovered utilizing decimal diploma format.

**Second technique**: changing the x and y DD parts into miles earlier than discovering the Euclidean distance. This was completed by utilizing the very tough common of 69 mi per deg latitude, after which discovering longitude with the utmost miles per longitude on the equator (69.172 miles per deg lon) instances cosine of latitude in radians. So (√( ((69.172 * (cos(x2) – cos(x1))^2 ) + (69 * (y2 – y1)^2 ))), the place x1 and x2 are in radians. The result’s that 3% of the minimal distances are in a different way marked than the primary technique.

**Third technique**: Calculating geodesic distance utilizing geopy. Tremendous easy and theoretically extra correct, however the result’s {that a} staggering 16% of the minimal distances are in a different way marked in comparison with the primary technique. I additionally do not know how geopy finds these distances, so I am unable to examine what is finished in a different way.

All this has me questioning which technique to go off of. Ought to I keep on with geopy, because it suggests utilizing their bundle for coordinates? Thanks prematurely. I hope this wasn’t too complicated, it has been a head scratcher for me.

## Comments ( 7 )

Geographical distances are calculated with the Haversin formula:

`haversin(d / R) = haversin(lat2 – lat1) + cos(lat1) * cos(lat2) * haversin(lon2 – lon1)`

Note: the Haversin function is `sin^2(x/2)`

(Incidentally I had this laying around because of a self-piloting drone project)

Since you mentioned Python, you can write it yourself with:

def haversine(lat1, lon1, lat2, lon2):

R = 6371 # Radius of the Earth in kilometers

# Convert degrees to radians

lat1_rad = math.radians(lat1)

lon1_rad = math.radians(lon1)

lat2_rad = math.radians(lat2)

lon2_rad = math.radians(lon2)

# Calculate differences

delta_lat = lat2_rad – lat1_rad

delta_lon = lon2_rad – lon1_rad

# Haversine formula

a = math.sin(delta_lat/2) ** 2 + math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(delta_lon/2) ** 2

c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

distance = R * c

return distance

#

distance = haversine(lat1, lon1, lat2, lon2)

#1 won’t work because, as you know, a degree of longitude is shorter than a degree of latitude.

I see what you’re trying to do with #2. I’ve done it myself when I was too lazy to look up the haversine formula. What you really want there is something like (x is latitude, y is longitude)

distance = sqrt[ (x2-x1)^2 * ((y2-y1) * cos((x1+x2)/2))^2 ] * 69.172

Basically (x2-x1) is the difference in latitude; (y2-y1) * cos((x1 + x2)/2) is the difference in longitude, but remeasured in degrees-at-the-equator. I take cos((x1+x2)/2) instead of cos(x1) or cos(x2) so that the distance formula is symmetric.

You can also use pyproj to project the coordinates to UTM system and then calculate euclidean distance again

The earth is an imperfect spheroid.

You’re running into a common problem in geography/geology/etc. Coordinates on earth are defined within a human-defined coordinate system (e.g. [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System?wprov=sfti1)) because the earth is an imperfect spheroid. Different coordinate systems are related to one another through known transfer functions. Haversine by itself isn’t going to be the most accurate option. Geopy should be able to do it as it should be able to apply the correct transforms. That said, you need to:

1) know what system your coordinates are defined within and

2) tell geopy this via its API (I’m assuming this is possible because it’s a fundamental part of distance calculations, but I don’t actually know the library)

16% difference is a lot for short distances eg in the same census block. My first intuition is that those types of errors usually come from comparing two different coordinate systems without correctly mapping between them.

I’m a bot, *bleep*, *bloop*. Someone has linked to this thread from another place on reddit:

– [/r/datascienceproject] [Strange errors – calculating small distances from geographic coordinates (r/DataScience)](https://www.reddit.com/r/datascienceproject/comments/13jmuou/strange_errors_calculating_small_distances_from/)

*^(If you follow any of the above links, please respect the rules of reddit and don’t vote in the other threads.) ^([Info](/r/TotesMessenger) ^/ ^[Contact](/message/compose?to=/r/TotesMessenger))*