12 min read

A Case for ALL CAPS

Recently I resumed PPE allocation duties for a couple of weeks and took the opportunity to automate a checklist of the orders sorted by region to help track weekly orders by the Warehouse team. As a data person, I couldn’t help but add scope creep by adding a secondary sort by facility name. The resulting data surprised me when I noticed that the secondary sort placed facilities with names in ALL CAPS before other facilities with Title Case names.

This led me to wonder -

  1. Do different libraries sort character values differently?
  2. Is there an advantage to use ALL CAPS if there’s a chance to be sorted first in a mixed case data environment? As someone whose first and last names are at the beginning of the alphabet, I have wondered whether that’s been an advantage to me or if I’ve been lucky on the occasions where I was selected in contests and seat upgrades.

A review of the data

This data was downloaded from the Washington State Department of Social and Health Services website filtered for Nursing facilities in King County. When the search results are returned by the website, it is sorted by facility name as one would expect, where the case does not matter, though it would be nice if the names could have been consistently formatted and makes me wonder about how the data gets to the state. Does one data entry person enter names in ALL CAPS or do the providers enter their names in ALL CAPS? All other data fields across facilities appears “properly” Title Cased, even Admin names.

library(here)
library(knitr)
library(tidyverse)

data <- read_csv(here("static", "data", "NFListing.csv")) %>%
  select(nf_loc_city,nf_name) %>%
  rename(city = nf_loc_city, name = nf_name) 

#TeamTidyverse > Take 1

In the #Tidyverse, one can arrange data by multiple fields and see if the case matters. The first set of mixed case city and names occurs for Federal Way. I am including all 51 rows of data to show all wrinkles in sorting for Redmond, Renton and Seattle facilities.

data %>%
  arrange(city, name) %>%
  kable()
city name
Auburn Advanced Post Acute
Auburn Canterbury House
Auburn Lea Hill Rehabilitation and Care Center
Auburn North Auburn Rehabilitation & Health Center
BELLEVUE MISSION HEALTHCARE AT BELLEVUE
BELLEVUE THE SPRINGS AT PACIFIC REGENT
BURIEN Burien Nursing and Rehabilitation Center
DES MOINES JUDSON PARK HEALTH CENTER
DES MOINES STAFFORD HEALTHCARE
DES MOINES WESLEY HOMES HEALTH CENTER
Enumclaw Enumclaw Health and Rehabilitation Center
Federal Way Garden Terrace Healthcare Center of Federal Way
Federal Way HALLMARK MANOR
Federal Way LIFE CARE CENTER OF FEDERAL WAY
FEDERAL WAY AVALON CARE CENTER - FEDERAL WAY, LLC
Issaquah Briarwood at Timber Ridge
ISSAQUAH Issaquah Nursing & Rehabilitation Center
ISSAQUAH PROVIDENCE MARIANWOOD
KENT BENSON HEIGHTS REHABILITATION CENTER
KENT SUNRISE HAVEN
Kirkland LIFE CARE CENTER OF KIRKLAND
Mercer Island COVENANT SHORES HEALTH CENTER
NORTH BEND Regency North Bend Rehabilitation and Nursing Center
Redmond REDMOND CARE AND REHABILITATION CENTER
REDMOND CORWIN CENTER AT EMERALD HEIGHTS
Renton Mission Healthcare at Renton
Renton Renton Nursing and Rehabilitation Center
RENTON Talbot Center for Rehabilitation and Healthcare
Seattle BAILEY-BOUSHAY HOUSE
Seattle Ballard Center
Seattle SAINT ANNE NURSING AND REHABILITATION CENTER
Seattle SEA MAR COMMUNITY CARE CENTER
Seattle Seattle Medical Post Acute Care
Seattle Transitional Care Center of Seattle
Seattle Washington Care Center
SEATTLE BAYVIEW MANOR
SEATTLE CAROLINE KLINE GALLAND HOME, THE
SEATTLE COLUMBIA LUTHERAN HOME
SEATTLE FOSS HOME AND VILLAGE
SEATTLE HEARTHSTONE, THE
SEATTLE KIN ON HEALTH CARE CENTER
SEATTLE MIRABELLA
SEATTLE PARK SHORE
SEATTLE Park West Care Center
SEATTLE PROVIDENCE MOUNT ST VINCENT
SEATTLE QUEEN ANNE HEALTHCARE
SEATTLE THE TERRACES AT SKYLINE
Shoreline FIRCREST SCHOOL PAT N
Shoreline Park Ridge Care Center
Shoreline RICHMOND BEACH REHAB
Shoreline SHORELINE HEALTH AND REHABILITATION

Observation: Title Case city names precede ALL CAPS city names per AVALON CARE CENTER - FEDERAL WAY, LLC in FEDERAL WAY. Also true for REDMOND CARE AND REHABILITATION CENTER in REDMOND.

Data.Table > Take 1

My colleague who developed the allocation algorithm is on #TeamData.Table for zippy processing. One way to order data in a data.table is through setorder.

library(data.table)
dt_data  <- fread(here("static", "data", "NFListing.csv"))
da_table <- dt_data[,.(city = nf_loc_city, name = nf_name)]
setorder(da_table, city, name)
da_table %>%
  kable()
city name
Auburn Advanced Post Acute
Auburn Canterbury House
Auburn Lea Hill Rehabilitation and Care Center
Auburn North Auburn Rehabilitation & Health Center
BELLEVUE MISSION HEALTHCARE AT BELLEVUE
BELLEVUE THE SPRINGS AT PACIFIC REGENT
BURIEN Burien Nursing and Rehabilitation Center
DES MOINES JUDSON PARK HEALTH CENTER
DES MOINES STAFFORD HEALTHCARE
DES MOINES WESLEY HOMES HEALTH CENTER
Enumclaw Enumclaw Health and Rehabilitation Center
FEDERAL WAY AVALON CARE CENTER - FEDERAL WAY, LLC
Federal Way Garden Terrace Healthcare Center of Federal Way
Federal Way HALLMARK MANOR
Federal Way LIFE CARE CENTER OF FEDERAL WAY
ISSAQUAH Issaquah Nursing & Rehabilitation Center
ISSAQUAH PROVIDENCE MARIANWOOD
Issaquah Briarwood at Timber Ridge
KENT BENSON HEIGHTS REHABILITATION CENTER
KENT SUNRISE HAVEN
Kirkland LIFE CARE CENTER OF KIRKLAND
Mercer Island COVENANT SHORES HEALTH CENTER
NORTH BEND Regency North Bend Rehabilitation and Nursing Center
REDMOND CORWIN CENTER AT EMERALD HEIGHTS
RENTON Talbot Center for Rehabilitation and Healthcare
Redmond REDMOND CARE AND REHABILITATION CENTER
Renton Mission Healthcare at Renton
Renton Renton Nursing and Rehabilitation Center
SEATTLE BAYVIEW MANOR
SEATTLE CAROLINE KLINE GALLAND HOME, THE
SEATTLE COLUMBIA LUTHERAN HOME
SEATTLE FOSS HOME AND VILLAGE
SEATTLE HEARTHSTONE, THE
SEATTLE KIN ON HEALTH CARE CENTER
SEATTLE MIRABELLA
SEATTLE PARK SHORE
SEATTLE PROVIDENCE MOUNT ST VINCENT
SEATTLE Park West Care Center
SEATTLE QUEEN ANNE HEALTHCARE
SEATTLE THE TERRACES AT SKYLINE
Seattle BAILEY-BOUSHAY HOUSE
Seattle Ballard Center
Seattle SAINT ANNE NURSING AND REHABILITATION CENTER
Seattle SEA MAR COMMUNITY CARE CENTER
Seattle Seattle Medical Post Acute Care
Seattle Transitional Care Center of Seattle
Seattle Washington Care Center
Shoreline FIRCREST SCHOOL PAT N
Shoreline Park Ridge Care Center
Shoreline RICHMOND BEACH REHAB
Shoreline SHORELINE HEALTH AND REHABILITATION

Observation: ALL CAPS city names precede Title Case city names as seen for Seattle facilities. Things get even weirder for Redmand and Renton where even the cities aren’t grouped in order.

What to do?

As a consultant, when discussing data quality, I would advocate for non-shouty data if source system flexibility makes that possible and doesn’t impact prior data reporting. But sometimes when you are downstream from data, you may find yourself getting inconsistently formatted data and you have to make the decision on how much you care about maddening cases and determine if you even have the time to invest in attempting to format names, which is often a tricky business.

#Tidyverse > Take 2

Rather than tackle facility names which may have initials and acronyms in ALL CAPS, formatting the city names will at least reveal how dplyr::arrange sorts data using city_formmatted in place of city.

library(stringr)
data %>%
  mutate(city_formatted = str_to_title(city)) %>%
  arrange(city_formatted, name) %>%
  select(city, city_formatted, name) %>%
  kable()
city city_formatted name
Auburn Auburn Advanced Post Acute
Auburn Auburn Canterbury House
Auburn Auburn Lea Hill Rehabilitation and Care Center
Auburn Auburn North Auburn Rehabilitation & Health Center
BELLEVUE Bellevue MISSION HEALTHCARE AT BELLEVUE
BELLEVUE Bellevue THE SPRINGS AT PACIFIC REGENT
BURIEN Burien Burien Nursing and Rehabilitation Center
DES MOINES Des Moines JUDSON PARK HEALTH CENTER
DES MOINES Des Moines STAFFORD HEALTHCARE
DES MOINES Des Moines WESLEY HOMES HEALTH CENTER
Enumclaw Enumclaw Enumclaw Health and Rehabilitation Center
FEDERAL WAY Federal Way AVALON CARE CENTER - FEDERAL WAY, LLC
Federal Way Federal Way Garden Terrace Healthcare Center of Federal Way
Federal Way Federal Way HALLMARK MANOR
Federal Way Federal Way LIFE CARE CENTER OF FEDERAL WAY
Issaquah Issaquah Briarwood at Timber Ridge
ISSAQUAH Issaquah Issaquah Nursing & Rehabilitation Center
ISSAQUAH Issaquah PROVIDENCE MARIANWOOD
KENT Kent BENSON HEIGHTS REHABILITATION CENTER
KENT Kent SUNRISE HAVEN
Kirkland Kirkland LIFE CARE CENTER OF KIRKLAND
Mercer Island Mercer Island COVENANT SHORES HEALTH CENTER
NORTH BEND North Bend Regency North Bend Rehabilitation and Nursing Center
REDMOND Redmond CORWIN CENTER AT EMERALD HEIGHTS
Redmond Redmond REDMOND CARE AND REHABILITATION CENTER
Renton Renton Mission Healthcare at Renton
Renton Renton Renton Nursing and Rehabilitation Center
RENTON Renton Talbot Center for Rehabilitation and Healthcare
Seattle Seattle BAILEY-BOUSHAY HOUSE
Seattle Seattle Ballard Center
SEATTLE Seattle BAYVIEW MANOR
SEATTLE Seattle CAROLINE KLINE GALLAND HOME, THE
SEATTLE Seattle COLUMBIA LUTHERAN HOME
SEATTLE Seattle FOSS HOME AND VILLAGE
SEATTLE Seattle HEARTHSTONE, THE
SEATTLE Seattle KIN ON HEALTH CARE CENTER
SEATTLE Seattle MIRABELLA
SEATTLE Seattle PARK SHORE
SEATTLE Seattle Park West Care Center
SEATTLE Seattle PROVIDENCE MOUNT ST VINCENT
SEATTLE Seattle QUEEN ANNE HEALTHCARE
Seattle Seattle SAINT ANNE NURSING AND REHABILITATION CENTER
Seattle Seattle SEA MAR COMMUNITY CARE CENTER
Seattle Seattle Seattle Medical Post Acute Care
SEATTLE Seattle THE TERRACES AT SKYLINE
Seattle Seattle Transitional Care Center of Seattle
Seattle Seattle Washington Care Center
Shoreline Shoreline FIRCREST SCHOOL PAT N
Shoreline Shoreline Park Ridge Care Center
Shoreline Shoreline RICHMOND BEACH REHAB
Shoreline Shoreline SHORELINE HEALTH AND REHABILITATION

Verdict: WINNER

With my city data now formatted as Title Case, the secondary sorting of names is returned as hoped for both Federal Way, Redmond and Seattle facilities.

Data.Table > Take 2

The next test is to see how using the formatted city name impacts sort of facilities by setorder.

da_table <- dt_data[,.(city = nf_loc_city, name = nf_name)]
da_table[,city_formatted := str_to_title(city)]

setorder(da_table, city_formatted, name)
da_table[,.(city, city_formatted, name)] %>%
  kable()
city city_formatted name
Auburn Auburn Advanced Post Acute
Auburn Auburn Canterbury House
Auburn Auburn Lea Hill Rehabilitation and Care Center
Auburn Auburn North Auburn Rehabilitation & Health Center
BELLEVUE Bellevue MISSION HEALTHCARE AT BELLEVUE
BELLEVUE Bellevue THE SPRINGS AT PACIFIC REGENT
BURIEN Burien Burien Nursing and Rehabilitation Center
DES MOINES Des Moines JUDSON PARK HEALTH CENTER
DES MOINES Des Moines STAFFORD HEALTHCARE
DES MOINES Des Moines WESLEY HOMES HEALTH CENTER
Enumclaw Enumclaw Enumclaw Health and Rehabilitation Center
FEDERAL WAY Federal Way AVALON CARE CENTER - FEDERAL WAY, LLC
Federal Way Federal Way Garden Terrace Healthcare Center of Federal Way
Federal Way Federal Way HALLMARK MANOR
Federal Way Federal Way LIFE CARE CENTER OF FEDERAL WAY
Issaquah Issaquah Briarwood at Timber Ridge
ISSAQUAH Issaquah Issaquah Nursing & Rehabilitation Center
ISSAQUAH Issaquah PROVIDENCE MARIANWOOD
KENT Kent BENSON HEIGHTS REHABILITATION CENTER
KENT Kent SUNRISE HAVEN
Kirkland Kirkland LIFE CARE CENTER OF KIRKLAND
Mercer Island Mercer Island COVENANT SHORES HEALTH CENTER
NORTH BEND North Bend Regency North Bend Rehabilitation and Nursing Center
REDMOND Redmond CORWIN CENTER AT EMERALD HEIGHTS
Redmond Redmond REDMOND CARE AND REHABILITATION CENTER
Renton Renton Mission Healthcare at Renton
Renton Renton Renton Nursing and Rehabilitation Center
RENTON Renton Talbot Center for Rehabilitation and Healthcare
Seattle Seattle BAILEY-BOUSHAY HOUSE
SEATTLE Seattle BAYVIEW MANOR
Seattle Seattle Ballard Center
SEATTLE Seattle CAROLINE KLINE GALLAND HOME, THE
SEATTLE Seattle COLUMBIA LUTHERAN HOME
SEATTLE Seattle FOSS HOME AND VILLAGE
SEATTLE Seattle HEARTHSTONE, THE
SEATTLE Seattle KIN ON HEALTH CARE CENTER
SEATTLE Seattle MIRABELLA
SEATTLE Seattle PARK SHORE
SEATTLE Seattle PROVIDENCE MOUNT ST VINCENT
SEATTLE Seattle Park West Care Center
SEATTLE Seattle QUEEN ANNE HEALTHCARE
Seattle Seattle SAINT ANNE NURSING AND REHABILITATION CENTER
Seattle Seattle SEA MAR COMMUNITY CARE CENTER
Seattle Seattle Seattle Medical Post Acute Care
SEATTLE Seattle THE TERRACES AT SKYLINE
Seattle Seattle Transitional Care Center of Seattle
Seattle Seattle Washington Care Center
Shoreline Shoreline FIRCREST SCHOOL PAT N
Shoreline Shoreline Park Ridge Care Center
Shoreline Shoreline RICHMOND BEACH REHAB
Shoreline Shoreline SHORELINE HEALTH AND REHABILITATION

Verdict: BETTER

Secondary sort on mixed case names results in facility names starting with a given letter in ALL CAPS preceding other Title Cased names of same letter. The data looks pretty good until seeing where Ballad Center and Park West Care Center show up under Seattle facilities.

Data.Table > Take 3

When reading all of the documentation, it notes that -

data.table always reorders in “C-locale”. As a consequence, the ordering may be different to that obtained by base::order.

For final test, will try base::order on the data.table and sort by city_formatted and name.

da_table <- dt_data[,.(city = nf_loc_city, name = nf_name)]
da_table[,city_formatted := str_to_title(city)]

da_table[base::order(city_formatted, name),.(city, city_formatted, name)] %>%
  kable
city city_formatted name
Auburn Auburn Advanced Post Acute
Auburn Auburn Canterbury House
Auburn Auburn Lea Hill Rehabilitation and Care Center
Auburn Auburn North Auburn Rehabilitation & Health Center
BELLEVUE Bellevue MISSION HEALTHCARE AT BELLEVUE
BELLEVUE Bellevue THE SPRINGS AT PACIFIC REGENT
BURIEN Burien Burien Nursing and Rehabilitation Center
DES MOINES Des Moines JUDSON PARK HEALTH CENTER
DES MOINES Des Moines STAFFORD HEALTHCARE
DES MOINES Des Moines WESLEY HOMES HEALTH CENTER
Enumclaw Enumclaw Enumclaw Health and Rehabilitation Center
FEDERAL WAY Federal Way AVALON CARE CENTER - FEDERAL WAY, LLC
Federal Way Federal Way Garden Terrace Healthcare Center of Federal Way
Federal Way Federal Way HALLMARK MANOR
Federal Way Federal Way LIFE CARE CENTER OF FEDERAL WAY
Issaquah Issaquah Briarwood at Timber Ridge
ISSAQUAH Issaquah Issaquah Nursing & Rehabilitation Center
ISSAQUAH Issaquah PROVIDENCE MARIANWOOD
KENT Kent BENSON HEIGHTS REHABILITATION CENTER
KENT Kent SUNRISE HAVEN
Kirkland Kirkland LIFE CARE CENTER OF KIRKLAND
Mercer Island Mercer Island COVENANT SHORES HEALTH CENTER
NORTH BEND North Bend Regency North Bend Rehabilitation and Nursing Center
REDMOND Redmond CORWIN CENTER AT EMERALD HEIGHTS
Redmond Redmond REDMOND CARE AND REHABILITATION CENTER
Renton Renton Mission Healthcare at Renton
Renton Renton Renton Nursing and Rehabilitation Center
RENTON Renton Talbot Center for Rehabilitation and Healthcare
Seattle Seattle BAILEY-BOUSHAY HOUSE
Seattle Seattle Ballard Center
SEATTLE Seattle BAYVIEW MANOR
SEATTLE Seattle CAROLINE KLINE GALLAND HOME, THE
SEATTLE Seattle COLUMBIA LUTHERAN HOME
SEATTLE Seattle FOSS HOME AND VILLAGE
SEATTLE Seattle HEARTHSTONE, THE
SEATTLE Seattle KIN ON HEALTH CARE CENTER
SEATTLE Seattle MIRABELLA
SEATTLE Seattle PARK SHORE
SEATTLE Seattle Park West Care Center
SEATTLE Seattle PROVIDENCE MOUNT ST VINCENT
SEATTLE Seattle QUEEN ANNE HEALTHCARE
Seattle Seattle SAINT ANNE NURSING AND REHABILITATION CENTER
Seattle Seattle SEA MAR COMMUNITY CARE CENTER
Seattle Seattle Seattle Medical Post Acute Care
SEATTLE Seattle THE TERRACES AT SKYLINE
Seattle Seattle Transitional Care Center of Seattle
Seattle Seattle Washington Care Center
Shoreline Shoreline FIRCREST SCHOOL PAT N
Shoreline Shoreline Park Ridge Care Center
Shoreline Shoreline RICHMOND BEACH REHAB
Shoreline Shoreline SHORELINE HEALTH AND REHABILITATION

Verdict:: WINNER

Specifying base::order makes the data.table sorting experience consistent with dplyr and more importantly, sort more human-sensibly in spite of case differences across facility names.

A Sorted Affair

As often is the case, thinking that a simple task to sort data won’t take more than 5 minutes can quickly lead one down a rabbit hole when data is inconsistent. Standardizing data formats for character values would not only be easier on human eyes, but also result in more expected sorting behavior.