[17a] Semiparametric Fitting
Abstract: We describe a new method for fitting distributions to data which only requires knowledge of the parametric form of either the signal or the background but not both. The unknown distribution is fit using a non-parametric kernel density estimator. The method returns parameter estimates as well as limits on those estimates. Simulation studies show that these estimates are unbiased and that the limits on the estimates are correct.
Published in Nuclear Instruments and Methods in Physics Research A, (2012) Volume 685, p. 16-21., arXiv:1112.2299
[17b]C Code for Semiparametric Fitting
Abstract: C++ code for the semiparametric ditting discussed in [17a]
 "Solution to Banff 2 Challenge Based on Likelihood Ratio Test"
Abstract: We describe our solution to the Banff 2 challenge problems as well as the outcomes.
 A Test for Equality of Distributions in High Dimensions, with A. Lopez
Abstract: We present a method which tests whether or not two datasets (one of which could be Monte Carlo generated) might come from the same distribution. Our method works in arbitrarily high dimensions.
 A Shared Spatial Cache Model for Mobile Environments, with Fernando J. Maymi (West Point Military Academy) and Manuel Rodriguez-Martinez (UPRM), published in the Proceedings of MobiDE'2010, Ninth International ACM Workshop on Data Engineering for Wireless and Mobile Access, June 6th, 2010, Indianapolis, Indiana, USA (in conjunction with SIGMOD/PODS 2010)
Abstract: In many scenarios, particularly in military and emergency response operations, mobile nodes that are in close proximity to each other exhibit a high degree of data affinity. For example, all soldiers in the same region, regardless of their specialty, will want to know all nearby threats, as well as all friendly assets. Since relaying queries to a distant server is costly in terms of bandwidth and battery power, it would be ideal to use local resources that are only a hop away. In this paper we propose a shared spatial cache that can be thought of as residing in a region rather than in any given node. Each node that participates in the cache holds an expendable part of the data, so that the loss of any node or small group of nodes can be tolerated with little or no degradation of service. We describe the analytical models that verify our claims and show the results of extensive simulations that validate our models under simulated but realistic conditions.
 Limits, discovery and cut optimization for a Poisson process with uncertainty in background and signal efficiency: TRolke 2.0., with J. Lundberg, J. Conrad, and A. Lopez, . Jul 2009. 18pp. arXiv:0907.3450
Abstract: A C++ class was written for the calculation of frequentist confidence intervals using the profile likelihood method. Seven combinations of Binomial, Gaussian, Poissonian and Binomial uncertainties are implemented. The package provides routines for the calculation of upper and lower limits, sensitivity and related properties. It also supports hypothesis tests which take uncertainties into account. It can be used in compiled C++ code, in Python or interactively via the ROOT analysis framework.
 A Test for the Presence of a Signal, with A. Lopez
Abstract: We describe a statistical hypothesis test for the presence of a signal based on the likelihood ratio statistic. We derive the test for several cases of interest and also show that for those cases the test works very well, even far out in the tails of the distribution. We also study extensions of the test to cases where there are multiple channels.
 Limits and Confidence Intervals in the Presence of Nuisance Parameters, with A. Lopez and J. Conrad, Nuclear Instruments and Methods in Physics Research A, 551/2-3, 2005, pp. 493-503, physics/0403059
Abstract: We study the frequentist properties of confidence intervals computed by the method known to statisticians as the Profile Likelihood. It is seen that the coverage of these intervals is surprisingly good over a wide range of possible parameter values for important classes of problems, in particular whenever there are additional nuisance parameters with statistical or systematic errors.
For the routines to carry out the calculations go here.
 How to Claim a Discovery, with A. Lopez, Proceedings of PHYSTAT2003: Statistical Problems in Particle Physics, Astrophysics and Cosmology, SLAC, p41-44.
Abstract: We describe a statistical hypothesis test for the presence of a signal. The test allows the researcher to fix the signal location and/or width a priori, or perform a search to find the signal region that maximizes the signal. The background rate and/or distribution can be known or might be estimated from the data. Cuts can be used to bring out the signal.
 Search for Rare and Forbidden 3-body Di-muon Decays of the Charmed Mesons D+ and D+s, hep-ex/0306049, the FOCUS collaboration.
A high energy physics paper using the analysis tools developed in , ,  and 
 Calibration for Simultaneity: (Re) Sampling Methods for Simultaneous Inference with Applications to Function Estimation and Functional Data, with Andreas Buja, in preparation for resubmittion to JASA.
Abstract: We describe and illustrate a simple Monte Carlo technique for carrying out simultaneous inference with arbitrarily many statistics. Special cases of the technique have appeared in the literature, but there exists widespread unawareness of the simplicity and broad applicability of this solution to simultaneous inference. Simultaneous inference for multiple statistics gives the appearance of an ill-posed search problem because it is not clear how to choose among the too many possibilities of simultaneous coverage regions. The problem can, however, be simplifed by restricting the search to a one-parameter family of nested regions and select the region whose estimated coverage probability equals the desired value. Natural one-parameter families are readiliy available.
The technique applies whenever inference is based on a single distribution. A nonexhaustive list of examples of such distributions are: 1) fixed distributions such as standard normals when diagnosing distributional assumptions, 2) conditional null distributions in exact tests with Neyman structure, in particular permutation tests, 3) bootstrap distributions for bootstrap condence regions, 4) Bayesian posterior distributions for high-dimensional posterior probability regions, or 5) predictive distributions for multiple prediction intervals.
 An Extension of the Normal Probability Plot, in preparation for submission to TAS. This version of the paper is a few years old, a new and substantely different one will come later this year.
Abstract: This is a follow-up paper to the Sampling/Resampling paper above, discussing the details and performance of one application of the general idea discribed there: we discuss an extension of the standard normal probability plot that combines the graphical nature of the plot with a formal hypothesis test for normality and thereby helps in assessing the severity of the departure from the normal distribution. We perform a simulation study which shows that the performance of this method is comparable to other tests for normality.
The corresponding Splus routine for drawing the normal probability plot with the envelope will be available here soon.
 A Glossary of Selected Statistical Terms, with Harrison Prosper and Jim Linneman, Proceedings Of The Conference On: Advanced Statistical Techniques in Particle Physics, Institute for Particle Physics Phenomenology, University of Durham, UK (2002), 314-330
Abstract: This glossary brings together some statistical concepts that physicists may happen upon in the course of their work. The aim is not absolute mathematical precision---few physicists would tolerate such a burden. Instead, (one hopes) there is just enough precision to be clear. We begin with an introduction and a list of notations. We hope this will make the glossary, which is in alphabetical order, somewhat easier to read.
 Bias-Corrected Confidence Intervals for Rare Searches, with A. Lopez, Proceedings Of The Conference On: Advanced Stat+istical Techniques in Particle Physics, Institute for Particle Physics Phenomenology, University of Durham, UK (2002), 44-48
Abstract: A short version of .
 Statistical Analysis of the SELEX Double Charm Signals with A. Lopez
Abstract: A discussion of the statistical significance of some discoveries claimed by the SELEX collaboration.
 Correcting the Minimization Bias in Searches for Small Signals'', with A. Lopez, Nuclear Instruments and Methods in Physics Research A, vol 503/3, 2003, pp 617 - 624, hep-ph/0206139
Abstract: We discuss a method for correcting the bias in the limits for small signals if those limits were found based on cuts that were chosen by minimizing a criterion such as sensitivity. This type of bias is commonly present when a "minimization" and an "evaluation" are done at the same time. We propose to use a variant of the statistical bootstrap to adjust the limits. A Monte Carlo study shows that these new limits have correct coverage.
 Setting Limits for Poisson Rates in the Presence of Noise, Proceedings of SIDIM 2000
Abstract: A short version of .
 Confidence Intervals and Upper Bounds for Small Signals in the Presence of Background Noise, with A. Lopez, Nuclear Instruments and Methods in Physics Research A, V.458, 2001, 745-758, hep-ph/0005187
Abstract: We discuss a new method for setting limits on small signals in the presence of background noise. The method is based on a combination of a two dimensional confidence region and the large sample approximation to the likelihood ratio test statistic. It automatically quotes upper limits for small signals and two-sided confidence intervals for larger samples. We show that this method gives the correct coverage and also has good power.
2) Routines for the methods discussed , ,  and . There is only one file which includes everything. The routines are in Fortran and the file routines.pdf has all the necessary explanations.
Password protected pages: here