Stata: some string fucntions

MAKE CATEGORICAL 1 or 0 fields from a list of variables:

  • encode party, gen (party_n)

-> encodes string to numbers starting with 1, 2 3 etc by category (1 or 0)

  • encode mv, gen(mv_no)
  • gen x = party =="HDU"

-> encodes string to numbers only if parts match quotations (1 or 0) Case sensitive: use upper() if needed

TO ISOLATE PARTS OF VARIABLE NAME

  • gen x = strpos(diagnosis,"RSV") >0

-> codes 1 for any time the string "RSV" appears in the field (if it appears at least once it will be 1)

  • regexm(var1,"RSV") -> codes 1 if word "RSV" appears in var1 string

STRING FUNCTIONS

substring (name,1,comma-1)…extracts from name to first comma substr("abcdef",2,3) = "bcd" substr("abcdef",-3,2) = "de" substr("abcdef",2,.) = "bcdef" substr("abcdef",-3,.) = "def" substr("abcdef",2,0) = "" substr("abcdef",15,2) = ""

SPLITTING WORDS use egenmore functions

Word count

egen x=wordof(var1,word(1) ->to choose words from a string egen xx = wordof(dx1),word(2) or -1 for last word (needs egenmore) split x,(@) egen Grade = ston(grade), to(1/5) from (Poor Fair Good "Very good" Excellent) maps number from a string egen yy = dayofyear(date_ad), m(1) -> counts days of year from jan , m(5) would be strating in april

Leave a comment