Recently I resumed PPE allocation duties for a couple of weeks and took the opportunity to automate a checklist of the orders sorted by region to help track weekly orders by the Warehouse team. As a data person, I couldn’t help but add scope creep by adding a secondary sort by facility name. The resulting data surprised me when I noticed that the secondary sort placed facilities with names in ALL CAPS before other facilities with Title Case names.
This led me to wonder -
- Do different libraries sort character values differently?
- Is there an advantage to use ALL CAPS if there’s a chance to be sorted first in a mixed case data environment? As someone whose first and last names are at the beginning of the alphabet, I have wondered whether that’s been an advantage to me or if I’ve been lucky on the occasions where I was selected in contests and seat upgrades.
A review of the data
This data was downloaded from the Washington State Department of Social and Health Services website filtered for Nursing facilities in King County. When the search results are returned by the website, it is sorted by facility name as one would expect, where the case does not matter, though it would be nice if the names could have been consistently formatted and makes me wonder about how the data gets to the state. Does one data entry person enter names in ALL CAPS or do the providers enter their names in ALL CAPS? All other data fields across facilities appears “properly” Title Cased, even Admin names.
library(here)
library(knitr)
library(tidyverse)
data <- read_csv(here("static", "data", "NFListing.csv")) %>%
select(nf_loc_city,nf_name) %>%
rename(city = nf_loc_city, name = nf_name)
#TeamTidyverse > Take 1
In the #Tidyverse, one can arrange data by multiple fields and see if the case matters. The first set of mixed case city and names occurs for Federal Way. I am including all 51 rows of data to show all wrinkles in sorting for Redmond, Renton and Seattle facilities.
data %>%
arrange(city, name) %>%
kable()
city | name |
---|---|
Auburn | Advanced Post Acute |
Auburn | Canterbury House |
Auburn | Lea Hill Rehabilitation and Care Center |
Auburn | North Auburn Rehabilitation & Health Center |
BELLEVUE | MISSION HEALTHCARE AT BELLEVUE |
BELLEVUE | THE SPRINGS AT PACIFIC REGENT |
BURIEN | Burien Nursing and Rehabilitation Center |
DES MOINES | JUDSON PARK HEALTH CENTER |
DES MOINES | STAFFORD HEALTHCARE |
DES MOINES | WESLEY HOMES HEALTH CENTER |
Enumclaw | Enumclaw Health and Rehabilitation Center |
Federal Way | Garden Terrace Healthcare Center of Federal Way |
Federal Way | HALLMARK MANOR |
Federal Way | LIFE CARE CENTER OF FEDERAL WAY |
FEDERAL WAY | AVALON CARE CENTER - FEDERAL WAY, LLC |
Issaquah | Briarwood at Timber Ridge |
ISSAQUAH | Issaquah Nursing & Rehabilitation Center |
ISSAQUAH | PROVIDENCE MARIANWOOD |
KENT | BENSON HEIGHTS REHABILITATION CENTER |
KENT | SUNRISE HAVEN |
Kirkland | LIFE CARE CENTER OF KIRKLAND |
Mercer Island | COVENANT SHORES HEALTH CENTER |
NORTH BEND | Regency North Bend Rehabilitation and Nursing Center |
Redmond | REDMOND CARE AND REHABILITATION CENTER |
REDMOND | CORWIN CENTER AT EMERALD HEIGHTS |
Renton | Mission Healthcare at Renton |
Renton | Renton Nursing and Rehabilitation Center |
RENTON | Talbot Center for Rehabilitation and Healthcare |
Seattle | BAILEY-BOUSHAY HOUSE |
Seattle | Ballard Center |
Seattle | SAINT ANNE NURSING AND REHABILITATION CENTER |
Seattle | SEA MAR COMMUNITY CARE CENTER |
Seattle | Seattle Medical Post Acute Care |
Seattle | Transitional Care Center of Seattle |
Seattle | Washington Care Center |
SEATTLE | BAYVIEW MANOR |
SEATTLE | CAROLINE KLINE GALLAND HOME, THE |
SEATTLE | COLUMBIA LUTHERAN HOME |
SEATTLE | FOSS HOME AND VILLAGE |
SEATTLE | HEARTHSTONE, THE |
SEATTLE | KIN ON HEALTH CARE CENTER |
SEATTLE | MIRABELLA |
SEATTLE | PARK SHORE |
SEATTLE | Park West Care Center |
SEATTLE | PROVIDENCE MOUNT ST VINCENT |
SEATTLE | QUEEN ANNE HEALTHCARE |
SEATTLE | THE TERRACES AT SKYLINE |
Shoreline | FIRCREST SCHOOL PAT N |
Shoreline | Park Ridge Care Center |
Shoreline | RICHMOND BEACH REHAB |
Shoreline | SHORELINE HEALTH AND REHABILITATION |
Observation: Title Case city names precede ALL CAPS city names per AVALON CARE CENTER - FEDERAL WAY, LLC in FEDERAL WAY. Also true for REDMOND CARE AND REHABILITATION CENTER in REDMOND.
Data.Table > Take 1
My colleague who developed the allocation algorithm is on #TeamData.Table for zippy processing. One way to order data in a data.table is through setorder.
library(data.table)
dt_data <- fread(here("static", "data", "NFListing.csv"))
da_table <- dt_data[,.(city = nf_loc_city, name = nf_name)]
setorder(da_table, city, name)
da_table %>%
kable()
city | name |
---|---|
Auburn | Advanced Post Acute |
Auburn | Canterbury House |
Auburn | Lea Hill Rehabilitation and Care Center |
Auburn | North Auburn Rehabilitation & Health Center |
BELLEVUE | MISSION HEALTHCARE AT BELLEVUE |
BELLEVUE | THE SPRINGS AT PACIFIC REGENT |
BURIEN | Burien Nursing and Rehabilitation Center |
DES MOINES | JUDSON PARK HEALTH CENTER |
DES MOINES | STAFFORD HEALTHCARE |
DES MOINES | WESLEY HOMES HEALTH CENTER |
Enumclaw | Enumclaw Health and Rehabilitation Center |
FEDERAL WAY | AVALON CARE CENTER - FEDERAL WAY, LLC |
Federal Way | Garden Terrace Healthcare Center of Federal Way |
Federal Way | HALLMARK MANOR |
Federal Way | LIFE CARE CENTER OF FEDERAL WAY |
ISSAQUAH | Issaquah Nursing & Rehabilitation Center |
ISSAQUAH | PROVIDENCE MARIANWOOD |
Issaquah | Briarwood at Timber Ridge |
KENT | BENSON HEIGHTS REHABILITATION CENTER |
KENT | SUNRISE HAVEN |
Kirkland | LIFE CARE CENTER OF KIRKLAND |
Mercer Island | COVENANT SHORES HEALTH CENTER |
NORTH BEND | Regency North Bend Rehabilitation and Nursing Center |
REDMOND | CORWIN CENTER AT EMERALD HEIGHTS |
RENTON | Talbot Center for Rehabilitation and Healthcare |
Redmond | REDMOND CARE AND REHABILITATION CENTER |
Renton | Mission Healthcare at Renton |
Renton | Renton Nursing and Rehabilitation Center |
SEATTLE | BAYVIEW MANOR |
SEATTLE | CAROLINE KLINE GALLAND HOME, THE |
SEATTLE | COLUMBIA LUTHERAN HOME |
SEATTLE | FOSS HOME AND VILLAGE |
SEATTLE | HEARTHSTONE, THE |
SEATTLE | KIN ON HEALTH CARE CENTER |
SEATTLE | MIRABELLA |
SEATTLE | PARK SHORE |
SEATTLE | PROVIDENCE MOUNT ST VINCENT |
SEATTLE | Park West Care Center |
SEATTLE | QUEEN ANNE HEALTHCARE |
SEATTLE | THE TERRACES AT SKYLINE |
Seattle | BAILEY-BOUSHAY HOUSE |
Seattle | Ballard Center |
Seattle | SAINT ANNE NURSING AND REHABILITATION CENTER |
Seattle | SEA MAR COMMUNITY CARE CENTER |
Seattle | Seattle Medical Post Acute Care |
Seattle | Transitional Care Center of Seattle |
Seattle | Washington Care Center |
Shoreline | FIRCREST SCHOOL PAT N |
Shoreline | Park Ridge Care Center |
Shoreline | RICHMOND BEACH REHAB |
Shoreline | SHORELINE HEALTH AND REHABILITATION |
Observation: ALL CAPS city names precede Title Case city names as seen for Seattle facilities. Things get even weirder for Redmand and Renton where even the cities aren’t grouped in order.
What to do?
As a consultant, when discussing data quality, I would advocate for non-shouty data if source system flexibility makes that possible and doesn’t impact prior data reporting. But sometimes when you are downstream from data, you may find yourself getting inconsistently formatted data and you have to make the decision on how much you care about maddening cases and determine if you even have the time to invest in attempting to format names, which is often a tricky business.
#Tidyverse > Take 2
Rather than tackle facility names which may have initials and acronyms in ALL CAPS, formatting the city names will at least reveal how dplyr::arrange sorts data using city_formmatted
in place of city
.
library(stringr)
data %>%
mutate(city_formatted = str_to_title(city)) %>%
arrange(city_formatted, name) %>%
select(city, city_formatted, name) %>%
kable()
city | city_formatted | name |
---|---|---|
Auburn | Auburn | Advanced Post Acute |
Auburn | Auburn | Canterbury House |
Auburn | Auburn | Lea Hill Rehabilitation and Care Center |
Auburn | Auburn | North Auburn Rehabilitation & Health Center |
BELLEVUE | Bellevue | MISSION HEALTHCARE AT BELLEVUE |
BELLEVUE | Bellevue | THE SPRINGS AT PACIFIC REGENT |
BURIEN | Burien | Burien Nursing and Rehabilitation Center |
DES MOINES | Des Moines | JUDSON PARK HEALTH CENTER |
DES MOINES | Des Moines | STAFFORD HEALTHCARE |
DES MOINES | Des Moines | WESLEY HOMES HEALTH CENTER |
Enumclaw | Enumclaw | Enumclaw Health and Rehabilitation Center |
FEDERAL WAY | Federal Way | AVALON CARE CENTER - FEDERAL WAY, LLC |
Federal Way | Federal Way | Garden Terrace Healthcare Center of Federal Way |
Federal Way | Federal Way | HALLMARK MANOR |
Federal Way | Federal Way | LIFE CARE CENTER OF FEDERAL WAY |
Issaquah | Issaquah | Briarwood at Timber Ridge |
ISSAQUAH | Issaquah | Issaquah Nursing & Rehabilitation Center |
ISSAQUAH | Issaquah | PROVIDENCE MARIANWOOD |
KENT | Kent | BENSON HEIGHTS REHABILITATION CENTER |
KENT | Kent | SUNRISE HAVEN |
Kirkland | Kirkland | LIFE CARE CENTER OF KIRKLAND |
Mercer Island | Mercer Island | COVENANT SHORES HEALTH CENTER |
NORTH BEND | North Bend | Regency North Bend Rehabilitation and Nursing Center |
REDMOND | Redmond | CORWIN CENTER AT EMERALD HEIGHTS |
Redmond | Redmond | REDMOND CARE AND REHABILITATION CENTER |
Renton | Renton | Mission Healthcare at Renton |
Renton | Renton | Renton Nursing and Rehabilitation Center |
RENTON | Renton | Talbot Center for Rehabilitation and Healthcare |
Seattle | Seattle | BAILEY-BOUSHAY HOUSE |
Seattle | Seattle | Ballard Center |
SEATTLE | Seattle | BAYVIEW MANOR |
SEATTLE | Seattle | CAROLINE KLINE GALLAND HOME, THE |
SEATTLE | Seattle | COLUMBIA LUTHERAN HOME |
SEATTLE | Seattle | FOSS HOME AND VILLAGE |
SEATTLE | Seattle | HEARTHSTONE, THE |
SEATTLE | Seattle | KIN ON HEALTH CARE CENTER |
SEATTLE | Seattle | MIRABELLA |
SEATTLE | Seattle | PARK SHORE |
SEATTLE | Seattle | Park West Care Center |
SEATTLE | Seattle | PROVIDENCE MOUNT ST VINCENT |
SEATTLE | Seattle | QUEEN ANNE HEALTHCARE |
Seattle | Seattle | SAINT ANNE NURSING AND REHABILITATION CENTER |
Seattle | Seattle | SEA MAR COMMUNITY CARE CENTER |
Seattle | Seattle | Seattle Medical Post Acute Care |
SEATTLE | Seattle | THE TERRACES AT SKYLINE |
Seattle | Seattle | Transitional Care Center of Seattle |
Seattle | Seattle | Washington Care Center |
Shoreline | Shoreline | FIRCREST SCHOOL PAT N |
Shoreline | Shoreline | Park Ridge Care Center |
Shoreline | Shoreline | RICHMOND BEACH REHAB |
Shoreline | Shoreline | SHORELINE HEALTH AND REHABILITATION |
Verdict: WINNER
With my city data now formatted as Title Case, the secondary sorting of names is returned as hoped for both Federal Way, Redmond and Seattle facilities.
Data.Table > Take 2
The next test is to see how using the formatted city name impacts sort of facilities by setorder.
da_table <- dt_data[,.(city = nf_loc_city, name = nf_name)]
da_table[,city_formatted := str_to_title(city)]
setorder(da_table, city_formatted, name)
da_table[,.(city, city_formatted, name)] %>%
kable()
city | city_formatted | name |
---|---|---|
Auburn | Auburn | Advanced Post Acute |
Auburn | Auburn | Canterbury House |
Auburn | Auburn | Lea Hill Rehabilitation and Care Center |
Auburn | Auburn | North Auburn Rehabilitation & Health Center |
BELLEVUE | Bellevue | MISSION HEALTHCARE AT BELLEVUE |
BELLEVUE | Bellevue | THE SPRINGS AT PACIFIC REGENT |
BURIEN | Burien | Burien Nursing and Rehabilitation Center |
DES MOINES | Des Moines | JUDSON PARK HEALTH CENTER |
DES MOINES | Des Moines | STAFFORD HEALTHCARE |
DES MOINES | Des Moines | WESLEY HOMES HEALTH CENTER |
Enumclaw | Enumclaw | Enumclaw Health and Rehabilitation Center |
FEDERAL WAY | Federal Way | AVALON CARE CENTER - FEDERAL WAY, LLC |
Federal Way | Federal Way | Garden Terrace Healthcare Center of Federal Way |
Federal Way | Federal Way | HALLMARK MANOR |
Federal Way | Federal Way | LIFE CARE CENTER OF FEDERAL WAY |
Issaquah | Issaquah | Briarwood at Timber Ridge |
ISSAQUAH | Issaquah | Issaquah Nursing & Rehabilitation Center |
ISSAQUAH | Issaquah | PROVIDENCE MARIANWOOD |
KENT | Kent | BENSON HEIGHTS REHABILITATION CENTER |
KENT | Kent | SUNRISE HAVEN |
Kirkland | Kirkland | LIFE CARE CENTER OF KIRKLAND |
Mercer Island | Mercer Island | COVENANT SHORES HEALTH CENTER |
NORTH BEND | North Bend | Regency North Bend Rehabilitation and Nursing Center |
REDMOND | Redmond | CORWIN CENTER AT EMERALD HEIGHTS |
Redmond | Redmond | REDMOND CARE AND REHABILITATION CENTER |
Renton | Renton | Mission Healthcare at Renton |
Renton | Renton | Renton Nursing and Rehabilitation Center |
RENTON | Renton | Talbot Center for Rehabilitation and Healthcare |
Seattle | Seattle | BAILEY-BOUSHAY HOUSE |
SEATTLE | Seattle | BAYVIEW MANOR |
Seattle | Seattle | Ballard Center |
SEATTLE | Seattle | CAROLINE KLINE GALLAND HOME, THE |
SEATTLE | Seattle | COLUMBIA LUTHERAN HOME |
SEATTLE | Seattle | FOSS HOME AND VILLAGE |
SEATTLE | Seattle | HEARTHSTONE, THE |
SEATTLE | Seattle | KIN ON HEALTH CARE CENTER |
SEATTLE | Seattle | MIRABELLA |
SEATTLE | Seattle | PARK SHORE |
SEATTLE | Seattle | PROVIDENCE MOUNT ST VINCENT |
SEATTLE | Seattle | Park West Care Center |
SEATTLE | Seattle | QUEEN ANNE HEALTHCARE |
Seattle | Seattle | SAINT ANNE NURSING AND REHABILITATION CENTER |
Seattle | Seattle | SEA MAR COMMUNITY CARE CENTER |
Seattle | Seattle | Seattle Medical Post Acute Care |
SEATTLE | Seattle | THE TERRACES AT SKYLINE |
Seattle | Seattle | Transitional Care Center of Seattle |
Seattle | Seattle | Washington Care Center |
Shoreline | Shoreline | FIRCREST SCHOOL PAT N |
Shoreline | Shoreline | Park Ridge Care Center |
Shoreline | Shoreline | RICHMOND BEACH REHAB |
Shoreline | Shoreline | SHORELINE HEALTH AND REHABILITATION |
Verdict: BETTER
Secondary sort on mixed case names results in facility names starting with a given letter in ALL CAPS preceding other Title Cased names of same letter. The data looks pretty good until seeing where Ballad Center and Park West Care Center show up under Seattle facilities.
Data.Table > Take 3
When reading all of the documentation, it notes that -
data.table always reorders in “C-locale”. As a consequence, the ordering may be different to that obtained by base::order.
For final test, will try base::order
on the data.table and sort by city_formatted
and name
.
da_table <- dt_data[,.(city = nf_loc_city, name = nf_name)]
da_table[,city_formatted := str_to_title(city)]
da_table[base::order(city_formatted, name),.(city, city_formatted, name)] %>%
kable
city | city_formatted | name |
---|---|---|
Auburn | Auburn | Advanced Post Acute |
Auburn | Auburn | Canterbury House |
Auburn | Auburn | Lea Hill Rehabilitation and Care Center |
Auburn | Auburn | North Auburn Rehabilitation & Health Center |
BELLEVUE | Bellevue | MISSION HEALTHCARE AT BELLEVUE |
BELLEVUE | Bellevue | THE SPRINGS AT PACIFIC REGENT |
BURIEN | Burien | Burien Nursing and Rehabilitation Center |
DES MOINES | Des Moines | JUDSON PARK HEALTH CENTER |
DES MOINES | Des Moines | STAFFORD HEALTHCARE |
DES MOINES | Des Moines | WESLEY HOMES HEALTH CENTER |
Enumclaw | Enumclaw | Enumclaw Health and Rehabilitation Center |
FEDERAL WAY | Federal Way | AVALON CARE CENTER - FEDERAL WAY, LLC |
Federal Way | Federal Way | Garden Terrace Healthcare Center of Federal Way |
Federal Way | Federal Way | HALLMARK MANOR |
Federal Way | Federal Way | LIFE CARE CENTER OF FEDERAL WAY |
Issaquah | Issaquah | Briarwood at Timber Ridge |
ISSAQUAH | Issaquah | Issaquah Nursing & Rehabilitation Center |
ISSAQUAH | Issaquah | PROVIDENCE MARIANWOOD |
KENT | Kent | BENSON HEIGHTS REHABILITATION CENTER |
KENT | Kent | SUNRISE HAVEN |
Kirkland | Kirkland | LIFE CARE CENTER OF KIRKLAND |
Mercer Island | Mercer Island | COVENANT SHORES HEALTH CENTER |
NORTH BEND | North Bend | Regency North Bend Rehabilitation and Nursing Center |
REDMOND | Redmond | CORWIN CENTER AT EMERALD HEIGHTS |
Redmond | Redmond | REDMOND CARE AND REHABILITATION CENTER |
Renton | Renton | Mission Healthcare at Renton |
Renton | Renton | Renton Nursing and Rehabilitation Center |
RENTON | Renton | Talbot Center for Rehabilitation and Healthcare |
Seattle | Seattle | BAILEY-BOUSHAY HOUSE |
Seattle | Seattle | Ballard Center |
SEATTLE | Seattle | BAYVIEW MANOR |
SEATTLE | Seattle | CAROLINE KLINE GALLAND HOME, THE |
SEATTLE | Seattle | COLUMBIA LUTHERAN HOME |
SEATTLE | Seattle | FOSS HOME AND VILLAGE |
SEATTLE | Seattle | HEARTHSTONE, THE |
SEATTLE | Seattle | KIN ON HEALTH CARE CENTER |
SEATTLE | Seattle | MIRABELLA |
SEATTLE | Seattle | PARK SHORE |
SEATTLE | Seattle | Park West Care Center |
SEATTLE | Seattle | PROVIDENCE MOUNT ST VINCENT |
SEATTLE | Seattle | QUEEN ANNE HEALTHCARE |
Seattle | Seattle | SAINT ANNE NURSING AND REHABILITATION CENTER |
Seattle | Seattle | SEA MAR COMMUNITY CARE CENTER |
Seattle | Seattle | Seattle Medical Post Acute Care |
SEATTLE | Seattle | THE TERRACES AT SKYLINE |
Seattle | Seattle | Transitional Care Center of Seattle |
Seattle | Seattle | Washington Care Center |
Shoreline | Shoreline | FIRCREST SCHOOL PAT N |
Shoreline | Shoreline | Park Ridge Care Center |
Shoreline | Shoreline | RICHMOND BEACH REHAB |
Shoreline | Shoreline | SHORELINE HEALTH AND REHABILITATION |
Verdict:: WINNER
Specifying base::order
makes the data.table sorting experience consistent with dplyr and more importantly, sort more human-sensibly in spite of case differences across facility names.
A Sorted Affair
As often is the case, thinking that a simple task to sort data won’t take more than 5 minutes can quickly lead one down a rabbit hole when data is inconsistent. Standardizing data formats for character values would not only be easier on human eyes, but also result in more expected sorting behavior.