I still very vividly remember my first ever lecture as an MSc student
– it was a “Business Intelligence and Data Management” class and the teacher was explaining about the major discrepancies in data formats that can emerge in large-scale organizations…
Wow! I’ve actually seen each one of them when I was working as an analyst in that multinational company!
So, if you’re just starting as an analyst and you need to be getting data to combine from a couple of source systems, you should definitely watch out for:
(1) different keys, same data
(e.g. Cost Ceners have been truncated to 10 digits in 1 system but left with their original 16-digits in another)
(2) same person, different spellings
(e.g. TENEVA ANGELINA in 1 system vs. Teneva, Angelina in another)
(3) use of different names
(e.g. Nederland in some record entries vs. Netherlands in others; Great Britain in 1 system vs. United Kingdom in another)
(4) same data, different names
(e.g. “Cost Center” in 1 system vs. “Cost Location” in another and “Location Code” in 3rd)
(5) different data, same name
(e.g. a field is always called “Country” when depending on the table being used, it can actually mean “Project Country”, “Customer Country” or “Employee Country”)
(6) required fields have been left blank due to no data entry validation in place
(7) use ‘999999’ if you can’t proceed without giving a number
What can I do then?
(6) and (7) are pretty much the classic case of “garbage in, garbage out” – you can’t do that much about them, apart from highlighting to relevant stakeholders that certain processes may need changing
(4) and (5) are solvable through some business expertise – best to approach senior analysts on your team – they’ll most likely have them figured out or at least should be able to point you to a point of contact for further questions!
(1), (2) and (3) are the easiest to overcome but approaches will vary depending on the tools you have at your disposal!