What is the difference between….all the functions that find a substring 1st position?

Before tackling the differences, let’s recap the similiarities:

  1. They’re all used to locate one string within another by identifying its first positon. 

2. Unlike their Excel counterparts, they will all return 0 if the substring under scrutiny cannot be found

To better illustrate the differences between the functions, I’ll be using the ‘TED Talks’ dataset from Kaggle, to look at the tags given to each Ted talk.

I want to identify the first position of both ‘technology’ and ‘TED, so that I could extract them in a separate string

(1) SQL Server: charindex() and patindex()

Consult the snippet below:

Whilst both are case insensitive and could easily be used interchangeably if you’re searching for a precise string, that’s not the case if you’re after a string that could have differences in spelling or vary slightly throughout the dataset

NB! Note that patindex() always requires you to encircle your string in % signs. Else, it won’t find it.  Conversely, adding % signs to charindex() will return nothing but 0s

Charindex(), however, allows you to specify the starting position of your search. That’s not the case with patindex()

(2) MySQL: instr() and locate()

Alike charindex() in SQL Server, neither instr() or locate() will work as expected if a % sign is included in the search.

Both instr() and locate() are case-insensitive

The only time when locate() can potentially return different results from instr() is when it’s been used in its 3-argument form and a starting position has been specified for the search.

By contrast, instr() always starts searching at the first position – there’s no option to define that yourself.

(3) PostgreSQL: position() and strpos()

Both position() and strpos() are case sensitive and neither endorses wild characters (i.e will work as expected if a % sign is included in the search string) 

Also, neither of them supports an optional argument that will allow you to specify the starting position of your search, so the only difference between the two is the one in syntax

To sum up:

Syntax Does it support optional arguments? Does it support wild characters? Is it case sensitive?
CHARINDEX ( expressionToFind , expressionToSearch [ , start_location ] yes no no
PATINDEX ( ‘%pattern%’ , expression ) no yes no
INSTR (str,substr) no no no
LOCATE (substr,str,[pos]) yes no no
POSITION (substring in string) no no yes
STRPOS (string, substring) no no yes

Happy SQL-ing!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.