
Groups patient records from multiple isolates with a single integer patientID by grouping patient identifiers.

Grouping is based on five stages:

  1. matching nhs number and date of birth

  2. Hospital number & Date of Birth

  3. NHS number & Hospital Number

  4. Date of Birth & Surname IF nhs unknown

  5. Sex & Date of Birth & Fuzzy Name

Identifiers are copied over where they are missing or invalid to the grouped records.

  forename = "NONAME",
  surname = "NONAME",
  .keepValidNHS = FALSE,
  .forceCopy = FALSE,
  .experimental = FALSE



a data.frame or data.table containing the cleaned line list


a column as a character containing the patient NHS numbers


a column as a character containing the patient Hospital numbers


a column as a date variable containing the patient date of birth in date format


column as a character containing the patient sex; NOTE only works if coded only as character versions of Male/Female/Unknown; does not currently work with additional options #future update


a column as a character containing the patient forename; leave as NONAME if unavailable


a column as a character containing the patient surname; leave as NONAME if unavailable


optional; a column as a character to allow a sorting order on the id generation


optional, default FALSE; set TRUE if you wish to retain the column with the NHS checksum result stored as a BOOLEAN


optional, default FALSE; TRUE will force data.table to take a copy instead of editing the data without reference


optional, default FALSE; TRUE will enable the experimental features for recoding NA values based on the mode


A dataframe with one new variable:


a unique patient id


if retained using argument .keepValidNHS=TRUE, a BOOLEAN containing the result of the NHS checksum validation


id_test <- data.frame(
  nhs_n = c(
  hosp_n = c(
  sex = c(rep('F',6),rep('Male',4), 'U', 'U', 'M'),
  dateofbirth = as.Date(
  firstname = c(
  lastname = c(
  testdate = sample(seq.Date(Sys.Date()-21,Sys.Date(),"day"),13,replace = TRUE)
uk_patient_id(x = id_test,
              nhs_number = 'nhs_n',
              hospital_number = 'hosp_n',
              forename = 'firstname',
              surname = 'lastname',
              sex_mfu = 'sex',
              date_of_birth = 'dateofbirth',
              .sortOrder = 'testdate')[]
#>     id      nhs_n  hosp_n  sex dateofbirth firstname lastname   testdate
#>  1:  1         NA      13    F  1988-10-06    DANGER    MOOSE 2022-06-22
#>  2:  1 5185293519      13    M  1988-10-06    DANGER    MOUSE 2022-06-24
#>  3:  1 9434765919      13    F  1988-10-06    DANGER    MAUSE 2022-06-28
#>  4:  1         NA UNKNOWN    F  1988-10-06    DANGER    MOOSE 2022-06-30
#>  5:  1 3367170666      13    F  1988-10-06    DANGER    MOUSE 2022-06-30
#>  6:  1 9434765919      13    F  1900-01-01    DENGER    MOUSE 2022-07-03
#>  7:  1 9434765919      13    F  1988-10-06    DANGER    MOUSE 2022-07-11
#>  8:  2         NA    <NA> <NA>  2020-01-28     KRAZY     FRUG 2022-06-22
#>  9:  3 5185293519      31    M  1988-10-06    DANGER    MOUSE 2022-07-02
#> 10:  3 5185293519      31    M  1988-10-06    DANGER    MOUSE 2022-07-08
#> 11:  9         NA      96 <NA>  2020-01-28     CRAZY     FROG 2022-07-05
#> 12:  9         NA      96    M  2020-01-28         C     FROG 2022-07-05
#> 13:  9 8082318562      96    M  2020-01-28     CRAZY     FROG 2022-07-10