Some SQL Server functions for cleaning data

Email:
Password:
Email:
JLION.COM
May 4, 2009 SQL Server

I recently had to import a list of names from an excel spreadsheet into a database--a faily routine task. One of the challenges of this particular import was that in the spreadsheet, the names were stored in a last-name-first format. IE: "DOE, JOHN C". In the database, however, there were separate fields for first, middle and last name.

In the past, when confronted with this sort of problem, I had written a simple VB.NET program to massage the data prior to importing it into SQL Server. This time, I decided to see if I could split the name column into its first, middle and last components using SQL string functions. It turns out that I could, and I share the resulting SQL with you here:

The INTMIN function is used to determine the smallest of two integers.

CREATE function dbo.INTMIN(@iValue1 int,@iValue2 int) returns int
as
begin
	declare @iResult int
	if @iValue1<@iValue2
		set @iResult= @iValue1
	else
		set @iResult= @iValue2

	return @iResult
end

The Capitalize function is used to capitalize a name (IE: "JOE" becomes "Joe")

create function dbo.Capitalize(@name varchar(255)) returns varchar(255)
as
begin
	declare @result varchar(50)
	set @result=upper(substring(@name,1,1)) +
				lower(substring(@name,2,datalength(@name)-1))

	return @result
end

The LastNameFromLNFName function returns the last name component of a last-name-first string. (IE: "DOE, JOHN JACOB" becomes "DOE")

create function dbo.LastNameFromLNFName(@name varchar(255)) returns varchar(255)
as
begin
	return substring(@name,1,DBO.INTMIN(charindex(' ',@name,1),	charindex(',',@name,1))-1)
end

The FirstNameFromLNFName function returns the first name component of a last-name-first string. (IE: "DOE, JOHN JACOB" becomes "JOHN")

create function dbo.FirstNameFromLNFName(@name varchar(255)) returns varchar(255)
as
begin
	return substring(@name,
		(DBO.INTMIN(charindex(' ',@name,1),	charindex(',',@name,1)))+2,
		(charindex(' ',@name+' ',DBO.INTMIN(charindex(' ',@name,1),	charindex(',',@name,1))+2)) 
			-
		((DBO.INTMIN(charindex(' ',@name,1),	charindex(',',@name,1)))+2)
	)
end

The MiddleNameFromLNFName function returns the middle name component of a last-name-first string. (IE: "DOE, JOHN JACOB" becomes "JACOB")

The function accomodates two-part middle names like "SMITH, MARY PEGGY SUE".

There is a known problem with modifiers such as "II" and "JR". Fortunately I didn't have these in my source data. If I did, I'd probably create a new function to find and return them, and modifiy the MiddleNameFromLDFName function to not return certain strings ("JR", "SR", "II", etc.)

create function dbo.MiddleNameFromLNFName(@name varchar(255)) returns varchar(255)
as
begin
	return 
		substring(
			@name,
			charindex(' ',
					  @name+' ',
					charindex(' ',
						@name,
						DBO.INTMIN(
							charindex(' ',@name,1),	
							charindex(',',@name,1)
						)+1
					)+2
			),
			datalength(@name)-
				charindex(' ',
						  @name,
						charindex(' ',
							@name,
							DBO.INTMIN(
								charindex(' ',@name,1),	
								charindex(',',@name,1)
							)+1
						)+2
				)+1
		)
end

 

Created by Joe Lynds 2002-2008. Contact Joe
http://www.jlion.com