Weight-of-Evidence for
Forensic DNA profiles


David J Balding

Wiley 2005


Corrections and
comments

and

Glossary of
acronyms

 

1.  Corrections and comments (last updated 15/7/07)

 

I will continue to put here any errors or clarifications that have been pointed out or suggested by readers.  Thanks to John Buckleton and Jenny Conran for contributing some of these: further comments from other readers are welcome.  See below for glossary of acronyms.

 

Preface: Here and throughout the book references to Buckleton et al. (2004) should read Buckleton et al. (2005).

 

Section 4.1.1, p 47, Null alleles and allelic dropout.  It would have been clearer to use the term “silent” rather than “null” alleles, reserving the latter term for the scenario in which the entire STR locus has been deleted.  The statement: “Null alleles cause no problem for DNA profile interpretation” applies only to the identification setting.  Null/silent alleles can be highly problematic for relatedness since e.g. parent and child can appear to be homozygous for different alleles;  see T. M. Clayton, S. M. Hill, L. A. Denton, S. K. Watson and A. J. Urquhart, Primer binding site mutations affecting the typing of STR loci contained within the AMPFlSTR® SGM Plus™ kit, Forensic Science International  139 (2-3): 255-259, 2004.

 

Section 5.1.1, p 72, Pearson’s test.  The final statement about implementation in R needs clarification.  Use of the chisq.test command to test for HWE requires dividing the heterozygote count by two and thereby completing a 2 by 2 contingency table.  For example if the genotype counts are 10, 5, and 5, we divide the heterozygote count of 5 into two lots of 2.5 and use:

 

> chisq.test(matrix(c(10,2.5,2.5,5),2,2),corr=F)

 

        Pearson's Chi-squared test

 

data:  matrix(c(10, 2.5, 2.5, 5), 2, 2)

X-squared = 4.3556, df = 1, p-value = 0.03689

 

Warning message:

Chi-squared approximation may be incorrect in: chisq.test(matrix(c(10, 2.5, 2.5, 5), 2, 2), corr = F)

 

p 73, Fisher’s test.  The final statement is literally true but not helpful: the fisher.test function of R can readily be used to test for linkage equilibrium from diallelic haplotype data, but not for HWE because of the single heterozygote count (i.e. BG is not distinguished from GB).  An approximate test can be constructed for example via

 

hwe <- function(a,bc,d) fisher.test(matrix(c(a,round((bc+0.5)/2),round((bc-0.5)/2),d),2,2))

 

Section 5.8 Population genetics exercises.  In Q3 p=0.75 should read p=0.25 (the latter is consistent with the hint and the solutions on p 165).

 

Section 6.2.1, NRC, p 86.  The acronym NRC, used here and again on p 155, has not been defined.  It stands for National Research Council (of the US).  See full list of acronyms below.

 

Section 6.2.5, Confidence Limits, p 93.  The term “spurious” near the end of the central paragraph is intended in the sense of “appears to be useful but is not”.  Spurious can also mean “false” or “illegitimate”, but I hope it is clear from context that this sense is not intended here.  It would have been better to say “unnecessary”.

 

Section 6.5.2, p 93, L 13.  Typo: “i ≡ AC” should read “i ≡ BC”.

 

Section 6.6 p 110 In Q4, the reference to page 105 should read page 107.

 

Section 7.1.3 p 116 The reference to Balding and Nichols (1995) should have mentioned that there is an error in Table 2 of the cited article: in case Mother = AA, Alleged Father = AB, and Child = AB the 2 in the numerator should be deleted, so that the correct LR is 2(F+(1-F)pB)/(1+3F).

 

Section 7.1.8, Mutation, p 122.  I should have included a separate discussion of PCR primer mutations, see comment on Section 4.1.1 above.

 

Section 7.2.3, LR, p 128.  The acronym LR, used here and again on p 170, has not been defined.  It stands for Likelihood Ratio.

 

Chapter 8, Other approaches to weight of evidence, p 135.  The introductory discussion assumes the identification setting, which is perhaps a little confusing as it immediately follows the chapter on relatedness.  In my original book plan Chapter 8 was going to come after Chapter 3.

 

Chapter 10 Solutions to Exercises:

 

p 165 For Q2(b) the answer should be 0.0077 not 0.0083.  Also for Q3(b), in the table, 146 should be 143 and 165 should be 150.

 

p 167 Q1(a) The final 0.05 in the computation of R1h and R2h should both be 0.5 (but the final answers are correct).  However the product Rh = R1h x R2h should be 0.0040 not 0.0036.  This error is propagated to the final answer, which should be 0.976 not 0.977 and also the final answer for (b) which should be 0.973 not 0.975.

 

p 171 Q4(a) The formula for R should have 0.5 / Rpi instead of  0.5 x Rpi  Also the formulas taken from Table 7.2 should each have (1+2θ) in the denominator, not (1+θ).  Accordingly, the two instances of 1.02 in the calculation should both be 1.04, and the final answer should be 6.16, not 5.86.

 

p 173 In Q3(a), the reference to page 105 should read page 107.

 

Suggested addition:  allelic drop-out

The book is intended to be introductory in nature, and it avoids complex topics such as the statistical analysis of low copy number (LCN) profiles, which are difficult because of problems such as allelic dropout.  However, this topic is important and possibly should have been given an introductory treatment, as was done for mixtures.  Here is a very brief introduction to the issues, via a simple, single-locus example.

Suppose that the crime scene profile is A and the suspect s is AB.  This would constitute an exclusion under normal circumstances, but if the crime stain had extremely little DNA, or was degraded, then it is possible that s is the donor of the crime scene sample and that the B allele was not observed because of “drop out”, or strong preferential amplification of one allele.  Similarly, the true source of the crime stain could have any genotype that includes an A allele. Let Dx denote the probability that allele x would be subject to drop-out under the conditions to which the crime stain was exposed.  Ignoring coancestry, the likelihood ratio (LR) comparing the hypothesis that s is the source of the crime scene DNA with the hypothesis that an unrelated individual i is the source takes the form:

 

 

 

in which I have assumed that drop-out occurs independently for each allele, both for heterozygotes and homozygotes.  In the final expression, the summation is over all alleles, and a compensating factor is subtracted from the first term to cancel the effect of including allele A in the sum.  The dropout probability may increase with allele length, in which case the LR decreases (stronger evidence against s) as the length of allele B increases.

If the dropout probability is assumed to be equal to D for all alleles at the locus, then the LR simplifies to

 

LR = p2A(1-D)/D + 2 pA

 

As expected, the LR approaches ∞ in the limit as D → 0, corresponding to stronger evidence in favour of the innocence of s as dropout becomes more unlikely.  The approximation

 

LR = 2pA

 

is valid only if both pA is small and D is not small.  However, still assuming that D is constant across alleles, this approximation is always non-conservative (unless D=1 which is implausible if some alleles are observed).  To avoid a non-conservative approximation requires an assessment of the dropout probability, D.  This is inevitable since the less likely is dropout, the weaker the case against s.

            The above analysis assumes a simple yes/no result as to whether the allele is observed.  In practice there may be an observed signal in the EPG at the B allele, but which is weak and does not reach the usual criteria for an allele to be confidently “called”.  Ideally we would wish to undertake an analysis that took the full EPG into account: we would need to compare how likely is the observed EPG if (i) s is the source of the crime stain, and (ii) i is the source of the crime stain.  Thus, the LR would be much smaller if an observed signal just failed to meet the established criteria, than if no signal was observed at all.  However, the details of such an analysis are beyond the scope of the book, and indeed I don’t think that anyone has yet satisfactorily implemented this approach.

 

 

2.  Glossary of acronyms

 

 

BN                   Bayesian Network (page 129)

CODIS            Combined DNA Index System (page 4)

DNA                DeoxyriboNucleic Acid

EPG                ElectroPheroGram (page 44)

FBI                  Federal Bureau of Investigation (of the US)

HWE               Hardy-Weinberg Equilibrium (page 69)

ibd                   identical by descent (page 91)

LCN                Low Copy Number (page 50)

LD                   Linkage Disequilibrium (page 75)

LR                   Likelihood ratio (page 24)

MSE               Mean Square Error (page 62)

mtDNA           mitochondrial DNA (page 50)

NRC                National Research Council (of the US) (page 154)

PCR                Polymerase Chain Reaction (page 44)

PPV                Positive Predictive Value (page 9)

R                     statistical software package, see www.r-project.org

SMM               Stepwise Mutation Model (page 59)

SNP                Single-Nucleotide Polymorphism (page 53)

STR                Short Tandem Repeat (page 3)

UK                   United Kingdom of Great Britain and Northern Ireland

US                   United States of America

 

 

 

3. Notes to self: minor typos

 

p 27.  L-6  delete “,” after “Apparently”

 

p 28.  “essentially just replicates” -> “one is a replicate of the other”

 

p 43.  “In this chapter” starts sentences 2 and 3.

 

p 46.  “partial repeats are often rare”

 

p 50.  ((LCN)

 

p 61. “for example” twice under frequency dependent

 

p 83.  reference to R & V 95 is duplicated

 

p 85.  define C=s=D notation

 

p 98. footnote: evidence of evidence

 

p 133.  There should be a line: “Solutions start on page 170”.

 

p 136.  Typo: the “?” on line 10 should be a “.”.