[Stable]

Groups patient records from multiple isolates with a single integer patientID by grouping patient identifiers.

Grouping is based on five stages:

  1. matching nhs number and date of birth

  2. Hospital number & Date of Birth

  3. NHS number & Hospital Number

  4. Date of Birth & Surname IF nhs unknown

  5. Sex & Date of Birth & Fuzzy Name

Identifiers are copied over where they are missing or invalid to the grouped records.

uk_patient_id(
  x,
  nhs_number,
  hospital_number,
  date_of_birth,
  sex_mfu,
  forename = "NONAME",
  surname = "NONAME",
  .sortOrder,
  .keepValidNHS = FALSE,
  .forceCopy = FALSE,
  .experimental = FALSE
)

Arguments

x

a data.frame or data.table containing the cleaned line list

nhs_number

a column as a character containing the patient NHS numbers

hospital_number

a column as a character containing the patient Hospital numbers

date_of_birth

a column as a date variable containing the patient date of birth in date format

sex_mfu

column as a character containing the patient sex; NOTE only works if coded only as character versions of Male/Female/Unknown; does not currently work with additional options #future update

forename

a column as a character containing the patient forename; leave as NONAME if unavailable

surname

a column as a character containing the patient surname; leave as NONAME if unavailable

.sortOrder

optional; a column as a character to allow a sorting order on the id generation

.keepValidNHS

optional, default FALSE; set TRUE if you wish to retain the column with the NHS checksum result stored as a BOOLEAN

.forceCopy

optional, default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference

.experimental

optional, default FALSE; TRUE will enable the experimental features for recoding NA values based on the mode

Value

A dataframe with one new variable:

id

a unique patient id

valid_nhs

if retained using argument .keepValidNHS=TRUE, a BOOLEAN containing the result of the NHS checksum validation

Examples

id_test <- data.frame(
  nhs_n = c(
    9434765919,9434765919,9434765919,NA,NA,
    3367170666,5185293519,5185293519,5185293519,8082318562,NA,NA,NA
  ),
  hosp_n = c(
    '13','13','13','UNKNOWN','13','13','13','31','31','96','96',NA,'96'),
  sex = c(rep('F',6),rep('Male',4), 'U', 'U', 'M'),
  dateofbirth = as.Date(
    c(
      '1988-10-06','1988-10-06','1900-01-01','1988-10-06','1988-10-06',
      '1988-10-06','1988-10-06','1988-10-06','1988-10-06','2020-01-28',
      '2020-01-28','2020-01-28','2020-01-28'
    )
  ),
  firstname = c(
    'Danger','Danger','Denger','Danger','Danger','DANGER','Danger',
    'Danger','Danger','Crazy','Crazy','Krazy','C'
  ),
  lastname = c(
    'Mouse','Mause','Mouse','Moose','Moose','Mouse','MOUSe',
    'Mouse','Mouse','Frog','FROG','Frug','Frog'
  ),
  testdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),13,replace = TRUE)
)
uk_patient_id(x = id_test,
              nhs_number = 'nhs_n',
              hospital_number = 'hosp_n',
              forename = 'firstname',
              surname = 'lastname',
              sex_mfu = 'sex',
              date_of_birth = 'dateofbirth',
              .sortOrder = 'testdate')[]
#>     id      nhs_n  hosp_n  sex dateofbirth firstname lastname   testdate
#>  1:  1         NA      13    F  1988-10-06    DANGER    MOOSE 2022-06-22
#>  2:  1 5185293519      13    M  1988-10-06    DANGER    MOUSE 2022-06-24
#>  3:  1 9434765919      13    F  1988-10-06    DANGER    MAUSE 2022-06-28
#>  4:  1         NA UNKNOWN    F  1988-10-06    DANGER    MOOSE 2022-06-30
#>  5:  1 3367170666      13    F  1988-10-06    DANGER    MOUSE 2022-06-30
#>  6:  1 9434765919      13    F  1900-01-01    DENGER    MOUSE 2022-07-03
#>  7:  1 9434765919      13    F  1988-10-06    DANGER    MOUSE 2022-07-11
#>  8:  2         NA    <NA> <NA>  2020-01-28     KRAZY     FRUG 2022-06-22
#>  9:  3 5185293519      31    M  1988-10-06    DANGER    MOUSE 2022-07-02
#> 10:  3 5185293519      31    M  1988-10-06    DANGER    MOUSE 2022-07-08
#> 11:  9         NA      96 <NA>  2020-01-28     CRAZY     FROG 2022-07-05
#> 12:  9         NA      96    M  2020-01-28         C     FROG 2022-07-05
#> 13:  9 8082318562      96    M  2020-01-28     CRAZY     FROG 2022-07-10