Research Seminar Series on Foundations of Statistics
We discuss a wide range of topics related to the foundations of statistics, such as reasoning and decision making under uncertainty, or theories and applications of imprecise probability.
We meet on Mondays at 18:45 in room 245 (Alte Bibliothek) of the Department of Statistics (Ludwigstraße 33, 80539 München). Anyone interested is welcome to attend. Contact Julia Plaß or Paul Fink for more information.
|20 April 2015||
Short introductions into research topics
|27 April 2015||
Aziz Omar (LMU München): Small Area Estimation in Case of Informative Nonresponse
|4 May 2015||
Continuing: Short introductions into research topics
Program (Wintersemester 2014/15, Mondays at 18:45):
|13 October 2014||
Informal discussion on literature and scheduling of next meetings
Suggestion: Presentations and Talks of SIPTA School 2014
|27 October 2014
3 November 2014
17 November 2014
24 November 2014
|1 December 2014||
Jour Fixe about ongoing research
|8 December 2014||
Zahra Amini Farsani (Iran University of Science & Technology, Teheran): Reconstruction of the Univariate & Bivariate Probability Distributions via Maximum Entropy Method
|15 December 2014||
Jour Fixe about ongoing research
|12 January 2015||
Discussion on (work in progress) ISIPTA papers
|2 February 2015|
Program (Sommersemester 2014, Mondays at 18:30):
|7 April 2014||
Barbara Felderer (IAB Nürnberg): Einkommensmessung in Befragungen: Die Mechanismen von Nonresponse und Messfehlern
Einkommensangaben in Befragungen weisen stets eine hohe Zahl fehlender und unplausibler Werte auf. In diesem Vortag soll der Frage nachgegangen werden, ob das Fehlen und die Präzision der Einkommensangabe mit dem wahren Einkommen assoziiert sind. Nur wenn ein solcher Zusammenhang nicht besteht, d.h. wenn die Variable Einkommen "Missing at Random" ist, können Verfahren zur Imputation und/oder Korrektur eingesetzt werden.
Zur Beantwortung dieser Frage dienen eine telefonische und eine Online-Befragung des IAB. Zur Analyse können die Befragungsdaten für Teile der Befragten mit Validierungsdaten, den Arbeitgebermeldungen zur Sozialversicherung, verglichen werden. Die Validierungsdaten gelten dabei als sehr zuverlässig und werden als Goldstandard verwendet.
|28 April 2014||
Christian Rink (DLR Oberpfaffenhofen): Typische Fragestellungen aus mobiler Robotik und 3D-Modellierung
Das Institut für Robotik und Mechatronik des Deutschen Zentrums für Luft- und Raumfahrt forscht und entwickelt in allen für die Robotik relevanten Bereichen. Diese erstrecken sich von der Hardwarentwicklung über hardwarenahe Programmierung, Regelungstechnik bis hin zur Entwicklung von Software und Algorithmen für komplexe, autonome Systeme. Die Abteilung Perzeption und Kognition des Instituts beschäftigt sich mit allen Fragestellungen der Wahrnehmung robotischer Systeme, insbesondere mit der Auswertung und Weiterverarbeitung von Kameradaten und Abstandssensoren wie Laserscannern.
Im Vortrag wird ein kurzer Einblick in einige ausgewählte Fragestellungen aus der mobilen Robotik und der 3D-Modellierung gegeben. Aus dem Bereich der mobilen Robotik wird die Selbstlokalisierung mobiler Roboter in bekannten Umgebungen mittels Partikelfilter vorgestellt sowie die autonome Exploration und Kartierung unbekannter Umgebungen umrissen. Aus dem Bereich der autonomen 3D-Modellierung werden die Probleme der Objektlageschätzung und die Planung nächster bester Aufnahmepositionen angesprochen und Parallelen zu den Fragestellungen in der mobilen Robotik aufgezeigt. Schliesslich werden Anknüpfungspunkte für die Verwendung von Intervallwahrscheinlichkeiten und Entscheidungstheorie diskutiert und ein Ausblick auf künftige Arbeiten gegeben.
|5 May 2014||
Georg Schollmeyer (LMU München): Utilizing Support Functions and Monotone Location Estimators for the Estimation of Partially Identified Regression Models
The computation of the Minkowski mean of a collection of compact and convex sets in a linear space is easily representable via the (pointwise) mean of the corresponding support functions of these sets. Replacing the mean by other location estimators leads to generalizations of the Minkowski mean to e.g. the Minkowski median. The function obtained by applying the median pointwise to the collection of all support functions is then generally no longer a support function of some set. To overcome this issue one can simply project the obtained function onto the space of support functions to get a support function that could serve as a set-valued generalization of the Minkowski mean.
This general idea can be applied to any kind of location estimator and if the used location estimator is monotone (which is the case for e.g. the median), the corresponding generalization of the Minkowski mean could be used to construct reasonable regression estimators for e.g. linear regression with interval-valued dependent and independent variables:
In a first step one divides the data into disjoint minimal subsets that could point identify the statistical model if the data were precise. Then for this minimal subsets it is often easy to calculate for the actually imprecise data the range of associated parameters (like e.g. the intercept and the slope) as the hidden precise data vary in between the interval-valued observed data to get a set of possible parameters for each minimal subset.
Aggregating now the collection of all these parameter sets via the abovementioned generalizations of the Minkowski mean leads to a set-valued estimator of the regression model. Depending on the used location estimators, different properties like some kind of robustness of the estimator can be reached without losing too much efficiency at least in some situations.
In this talk I would like to explicate the aforementioned constructions in more detail and illustrate their applicability along the lines of some examples.
|19 May 2014||
Informal presentation and discussion of ongoing research work
|26 May 2014||
Almond Stöcker (LMU München): Gedichtinterpretation und Modellbildung — Gadamers philosophische Hermeneutik angewandt auf die Statistik
In seinem Hauptwerk Wahrheit und Methode entwickelt Hans-Georg Gadamer eine umfassende philosophische Hermeneutik — das heißt, eine universale Lehre vom Verstehen und Interpretieren, unabhängig davon ob z.B. im historischen, philologischen oder juristischen Bereich. Die Präsentation versucht die philosophische Hermeneutik auch auf die Statistik anzuwenden und so textbasierte und datenbasierte Wissenschaft einmal im selben Licht erscheinen zu lassen.
In diesem Sinne sollen der hermeneutische Zirkel und das "hermeneutische Grundproblem" besprochen und zur Veranschaulichung Es ist alles eitel von Andreas Gryphius und der Münchener Mietspiegeldatensatz von 2003 parallel interpretiert werden.
Im Fokus steht also die Frage, inwieweit die philosophische Hermeneutik tatsächlich Ziele und Vorgehensweisen statistischer Analyse beschreiben kann.
|2 June 2014||
Gero Walter (Durham University): Density ratio class models and imprecision
In generalized Bayesian inference, a set of priors (instead of a single prior as in usual Bayesian inference) is taken to model partial and vague prior information. For growing sample sizes, we would expect that imprecision in posterior inferences generally decreases, mirroring the accumulation of information. Another useful modeling objective is prior-data conflict sensitivity: posterior inferences should reflect serious conflicts between prior assumptions and observed data.
The question is thus how models based on sets of priors can be defined such that (a) posterior inferences based on it are tractable and (b) the above modeling objectives are met. We study how three models of the "density ratio class" form (also called "interval of measures") score on the trade-off between tractability and modeling objectives. We see that density ratio class models are generally easily tractable, but lack seriously with respect to imprecision decrease or prior-data conflict sensitivity. We thus try to develop some intuitions about model behaviour and suggest a new model that combines ideas from all three previous models.
Andrea Wiencierz (LMU München): Nonparametric regression and imprecise data
We will consider nonparametric regression methods and the Likelihood-based Imprecise Regression (LIR) approach to regression with imprecise data. Possibilities for the generalization of smoothing methods within the LIR framework will be discussed.
|18 June 2014
Julia Plaß (LMU München): Likelihood based analyses concerning coarsened categorical data
Although at several points within a questionnaire coarse data can result, it is still uncertain how to deal with data of that kind.
Focusing on a coarse categorical response variable and constant coarsening, two likelihoods will be proposed - one imposing the assumption of iid variables and one incorporating covariates. While investigations based on the first likelihood will show that one generally obtains a set of estimators characterized by a special condition, the second likelihood seems to lead to nearly unbiased and identified estimators. Apart from analyses on the impact of correctly or wrongly involving the assumption of CAR, the transition from non-identifiability in the case without covariates to identifiability in the case with covariates will be discussed.
|23 June 2014||
Andrea Wiencierz (LMU München): Working hours mismatch and well-being: comparative evidence from Australian and German panel data
This talk is about a study where we use subjective measures of well-being to analyze how workers perceive working hours mismatch. Our particular interest is in the question of whether workers perceive hours of underemployment differently from hours of overemployment. Previous evidence on this issue is ambiguous. Using data from the Household, Income and Labour Dynamics in Australia (HILDA) Survey and the German Socio-Economic Panel (SOEP), this study estimates the relationship between working hours mismatch and well-being as bivariate smooth functions of desired hours and mismatch hours by tensor product p-splines. The results indicate that well-being is generally highest in the absence of hours mismatch and that underemployment may be more detrimental for well-being than overemployment.
|26 June 2014
(Thursday) at 10:15
Validation is a crucial step during the evaluation of new prediction rules. Within the framework of high-dimensional molecular (omics) data, the assessment of the added predictive value is of particular importance. For the verification whether the performance of a prediction rule can be improved if molecular data is included in addition to the standard clinical predictors, several approaches are already available. However, a special challenge arises if there is no independent validation data set at hand on which the added predictive value of the omics score can be measured.
Comparing the omics score to the standard clinical predictors on the same data set which was used to generate the score might lead to strongly biased results. Overfitting mechanisms might make the score seem more important than it actually is. To elude this problem, Tibshirani and Efron (2002) suggest to use their pre-validation approach. It mimics the situation of both training and validation data to be at hand by embedding the score construction into some kind of cross-validation loop.
In this presentation, based on my master thesis, the pre-validation approach is extended for the usage of the least absolute shrinkage and selection operator and a supervised principal component analysis for score generation. The investigation of the added predictive value on basis of simulation studies and a real breast cancer data set allows a comparison of molecular scores obtained with or without pre-validation. The main goal is to determine whether the pre-validated omics score can overcome overfitting issues compared to its not pre-validated counterpart.
|30 June 2014
Frank Coolen (Durham University): Semi-parametric predictive inference for bivariate data using copulas
Many real-world problems of statistical inference involve bivariate data. In this talk, a new semi-parametric method is presented for prediction of a future bivariate observation, by combining nonparametric predictive inference for the marginal with a parametric copula to model and estimate the dependence structure between two random quantities. The performance of the method is investigated via simulations, with particular attention to robustness with regard to the assumed copula in case of small data sets.
|7 July 2014||
Marco Cattaneo (LMU München): Maxitive Integral of Real-Valued Functions
An integral with respect to nonadditive measures is said to be maxitive if the integral of the (pointwise) maximum of two functions is the maximum of their integrals. Maxitive integrals find application in particular in decision theory and related fields. However, the definition of a maxitive integral for all real-valued functions is problematic. This definition is not determined by maxitivity alone: additional requirements on the integral are necessary. In particular, the consequences of additional requirements of convexity and invariance with respect to affine transformations of the real line are studied in more detail.
|11 July 2014
(Friday) at 18:00
Matthias Speidel (IAB Nürnberg): Discussing Effects of Different MAR-Settings
Data users and providers who decide to impute missing data should be aware of the effects their imputation model can have on the post-imputation analysis. Imputing hierarchical data (e.g. students in schools) requires specific models. Our aim is to illustrate (by simulation) and explain (by formula) biases in the analysis if the hierarchical imputation model implemented in SAS, SPSS and Stata is adopted. The explanation of the bias depends on a yet unexplained side result: within a Missing At Random setting for the missing generation function, the shape of the function highly affects the biases. In this talk I will elucidate and discuss the occurrence this finding.
Program (Wintersemester 2013/14, Mondays at 18:30):
|24 October 2013
(Thursday) at 14:00
Frank Coolen (Durham University): Nonparametric predictive inference for reproducibility of basic nonparametric tests
Reproducibility of tests is an important characteristic of the practical relevance of test outcomes. Recently, there has been substantial interest in the reproducibility probability (RP), where not only its estimation but also its actual definition and interpretation are not uniquely determined in the classical frequentist statistics framework. Nonparametric predictive inference (NPI) is a frequentist statistics approach that makes few assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty, and which explicitly focuses on future observations. The explicitly predictive nature of NPI provides a natural formulation for inferences on RP. We introduce the NPI approach to RP for some basic nonparametric tests, where exact results are achievable for relatively small sample sizes. We also discuss implementation of the approach in case of larger data sets and other tests, which is possible via an NPI-based bootstrap approach.
|11 November 2013||
Duygu İçen (Hacettepe Üniversitesi Ankara): On using different error measures for fuzzy linear regression analysis
Fuzzy regression analysis is a tool used to figure out the mathematical relation between two or more quantitative variables when vagueness in the relationship between the dependent and independent variables occurs. There are many different regression models currently being worked on. This study covers the use of different error measures for estimating the fuzzy linear regression models by using Monte Carlo Method. Fuzzy linear regression models are presented for two cases: in the first one input data is crisp and output data is fuzzy, and in the second case input and output data are both fuzzy. We utilize a general definition of absolute value of a triangular shaped fuzzy number in our Monte Carlo method. In addition, we determine that only two error measures, which are used in former studies, are not enough for estimating the parameters of fuzzy linear regression models.
|18 November 2013||
Georg Schollmeyer (LMU München): Quantiles for complete lattices
The concept of quantiles is commonly restricted to simple settings of totally ordered posets (e.g. the reals). In this talk I want to analyze to which extent the concept of quantiles can be generalized to arbitrary posets in a satisfying way.
The motivation for this generalization stems from the analysis of set-valued estimators for partially identified parameters in a linear space and a certain attempt to establish something like confidence regions for such estimators. This attempt makes use of the additive structure of the linear space and the lattice structure of the power set. The used concepts concerning the power set structure are very general and it turns out that essentially the structure of a complete lattice combined with a certain order-lindelöf property seems to be enough to get an acceptable generalization of the quantile concept.
Furthermore the achieved generalized notion of quantiles is reasonably extendable from classical probability measures and their associated belief functions to supermodular functions with a möbius inverse that is continuous from above.
After introducing the basics of posets, lattices and the needed lindelöf property I will derive some (partly already known) results that make the simple idea of generalized quantiles well defined in a broad class of settings.
|25 November 2013||
Jennifer Sinibaldi (IAB Nürnberg): Improving Response Propensity models with Interviewer Ratings of Response Likelihood
Discrete time hazard models predicting cooperation at the next call/contact/day are increasingly being used during survey data collection to inform fieldwork decisions. This analysis investigates the usefulness of a new interviewer observation to improve the predictive power of these propensity models. The interviewer observation is the interviewer's assessment of the likelihood that a given respondent will participate in the survey, collected at each contact of a telephone study. I evaluate whether these ratings significantly improve the fit and discrimination of the "classic" response propensity models. I also mimic the process of daily modeling, similar to what would be done during a live data collection, to determine if the ratings significantly improve the accuracy of the prediction compared to the "classic" propensity models. If the interviewer ratings contribute significantly to the predictive power of the models, collection of such observations could be routinely implemented to achieve better estimates of response propensity.
|9 December 2013||
Bernhard Haller (TU München): Estimating cause-specific and subdistribution hazards from a mixture model in a competing risks setting
In medical research the time to an event of interest maybe obscured by another, so called "competing", event. Analysis of event time data in the presence of competing risks is often performed focusing on either the cause-specific or the subdistribution hazards, mostly considering proportional hazards models. An alternative approach for the analysis of competing risks data is fitting a mixture model factorizing the joint distribution of event type and event time into the product of the marginal event type and the conditional event time distribution given the type of event.
Lau et al. (2011) presented how cause-specific and subdistribution hazards can be derived from a mixture model. They proposed to assume the conditional hazard rates in a mixture model to follow a flexible parametric survival distribution as the generalized gamma distribution in order to detect time-dependence of hazard rates and ratios. They applied the approach to a real data example, but properties of the approach were not investigated.
As we detected numerical problems for parameter estimation in the generalized gamma mixture model, an alternative approach using penalized B-splines to model the conditional hazard rates is proposed. The spline approach is compared to the parametric mixture model in different scenarios with predefined cause-specific or subdistribution hazards, respectively. The new approach was also applied to data from a cohort study investigating stratification for risk of cardiac death with death from a non-cardiac reason as competing event.
|12 December 2013
(Thursday) at 17:15
in room 144
Lev Utkin (FTU Saint Petersburg): Some models of genomic selection
Several models for solving the well-known problem of genomic selection are proposed when the number of DNA-markers (SNPs) is typically 10-100 times the number of individuals in the sample and SNPs may be correlated. The models use Steptoe x Morex barley mapping population as an example. The first group of the models exploits the Bahadur representation for computing the probability mass functions of genotypes. The second group uses an obvious modification of the Lasso method. Imprecise statistical models are also considered as a tool for simplifying computationally hard algorithms.
|16 December 2013||
Paul Fink (LMU München): Classification trees with missing data - An application of Imprecise Probability Methods?
Classification trees have been studied applying different imputation methods for missing data in a first step and then complete framework of classification trees with non-missing data is available. From a different perspective the question remains if one can account for the missing data in a more esplicit way, without the need of imputation. A naive approach would be to grow classification trees with any possible global configuration when assigning missing data and average over the trees. Another way would be to account of the missing data in the local node wise models.
An approach when missing data occur in the classification variable is proposed and ideas on how to deal in the case of missing data in the feature variables are presented.
|13 January 2014||
Informal presentation and discussion of ongoing research work
|27 January 2014||
Marco Cattaneo (LMU München): The likelihood approach to statistical decision problems
In both classical and Bayesian approaches, statistical inference is unified and generalized by the corresponding decision theory. This is not the case for the likelihood approach to statistical inference, in spite of the manifest success of the likelihood methods in statistics. The goal of the present work is to fill this gap, by extending the likelihood approach in order to cover decision making as well. The resulting likelihood decision functions generalize the usual likelihood methods (such as ML estimators and LR tests), while maintaining some of their key properties, and thus providing a theoretical foundation for established and new likelihood methods.
|31 March 2014||
Duygu İçen (Hacettepe Üniversitesi Ankara): On using Different Distance Measures for Fuzzy Numbers in Fuzzy Linear Regression Models
In fuzzy linear regression models (FLRMs) estimated and observed variables can be obtained as fuzzy numbers that have membership functions. The success of the regression model depends on the difference of the membership values between the observed and estimated fuzzy numbers.
In the literature there are many studies that define the distance measure for fuzzy numbers differently. Kauffman and Gupta (1991) considered a distance measure between two fuzzy numbers combined by the interval of alpha-cuts of fuzzy numbers. Three different definitions of the distance between two fuzzy numbers are introduced by Heilpern (1997). The first definition depends on the expect values of fuzzy numbers, second distance definition is combined by a Minkowski distance and the h-levels of the closed intervals of fuzzy numbers and the last one is called geometrical distance method based on the geometrical operation. Chen and Hsieh (1998, 2000) introduced the distance of two generalized fuzzy numbers as by using the term Graded Mean Integration Represent.
The difference of two fuzzy numbers also plays a key role in estimating the parameters of fuzzy linear regression by using Monte Carlo Method. Abdalla and Buckley (2007, 2008) used the error measure that is defined by Kim and Bishu (1998) in Monte Carlo method for FLRM. In that error measure, the difference of membership values between two membership functions has to be calculated.
In this study, we apply different definitions of distance measures that are introduced by Kauffman and Gupta (1991), Heilpern (1997) and Chen and Hsieh (1998, 2000) in the error measure considered by Kim and Bishu (1998). We compare these different distance measures that are introduced for fuzzy numbers in FLRMs.
Program (Sommersemester 2013, Mondays at 18:30):
|22 April 2013||
Barbara Felderer (IAB Nürnberg): Can we buy good answers: The influence of respondent incentives on item nonresponse and measurement error in a web survey
Even though a sampled person may agree to participate in a survey, she may not provide answers to all of the questions asked or might not answer questions correctly. This may lead to seriously biased estimates.
It is well known that incentives can effectively be used to decrease unit nonresponse. The question we are analyzing here is whether incentives are able to decrease item nonresponse and measurement error as well.
To study the effect of incentives on item nonresponse and measurement error, an experiment was conducted with participants of a web survey. In addition to an incentive for participation, an extra prepaid incentive ranging from 0.50 Euro to 4.50 Euro was given to some respondents towards the end of the questionnaire in the form of an Amazon-voucher. At the same time, respondents were requested to think hard about the answers to the next questions and be as precise as possible. In this experiment there are two reference groups: one group received the request but no incentive and the other did not receive any request or incentive.
The questions within the incentive experiment contain knowledge questions, recall questions referring to different time periods, and questions about subjective expectations.
Comparisons across the different incentive groups will allow for an assessment of the effectiveness of incentives on item nonresponse and measurement errror.
|29 April 2013||
Fabian Spanhel (LMU München): Model-free regression
The ultimate objective of regression analysis is the estimation of the conditional distribution of a variable given a number of explanatory variables. To overcome the curse of dimension and allow for mathematical tractability, model-based regression approaches are widely used. These model-based approaches are inherently related to the assumption of an additive predictor, i.e., the total effect of the explanatory variables on the conditional distribution is obtained by summing up the individual effects.
While these model-based regression approaches may be adequate in several situations they are rarely guided by reference to the actual data at hand. Rather, the assumptions on the models are checked by employing specification tests after the model was fitted to data. Moreover, no method does indeed model the conditional distribution but only some of its features.
In this talk, we introduce a quite different approach to regression analysis which is more data-driven and — if the joint distribution satisfies some dependence relations — model-free. Our approach is based on the generalization of partial correlation to partial dependence and extensions of the Rosenblatt transform. As a result, we obtain flexible regression functions which represent the complete conditional distribution and are no longer restricted to additive predictors.
|6 May 2013||
Andrea Wiencierz (LMU München): Support Vector Regression with interval data
This contribution to the Research Seminar will rather be a discussion about ongoing research work than a presentation of ready-to-publish results. I would like to discuss an approach to regression with interval data which was at first suggested in Utkin and Coolen (2011) and further generalized during my research stay with Lev Utkin in the beginning of this year. This approach uses the framework of Support Vector Regression, where the regression problem is formulated as a convex optimization problem. Based on this formulation, Utkin and Coolen (2011) suggest to account for the imprecision of the response variable by introducing additional constraints to the optimization problem, leading to a precise regression estimate. This idea can be generalized to obtain a set of regression functions as an imprecise result of the regression with interval-valued response. However, the statistical meaning of both the precise regression estimate and the set of functions is not obvious and shall be investigated.
|27 May 2013||
Bernhard Haller (TU München): Analysis of competing risks data and simulation of data following predefined subdistribution hazards
In many clinical studies the time to one certain type of event, e.g. the time from treatment initiation to cancer-specific or cardiac death, is of interest. These events can be obscured by the occurrence of other — so called competing — events (e.g. death from other causes). The most popular methods for analysis of competing risks data will be presented: methods based on the cause-specific hazards, methods based on the subdistribution hazards and the estimation of a mixture model.
Models using the subdistribution hazard became popular in medical research over the last couple of years leading to a variety of newly developed methods. In order to evaluate these methods in simulation studies, competing risks data following prespecified subdistribution hazards or hazard ratios, respectively, have to be generated. Problems and pitfalls are discussed and an algorithm for simulation of competing risks data following prespecified subdistribution hazards based on work by Beyersmann et al. (2009) and Sylvestre & Abrahamowicz (2008) will be presented.
Finally, ongoing work on the estimation of cause-specific and subdistribution hazards and hazard ratios from a flexible mixture model, which was originally introduced by Lau et al. (2011), will be presented.
|3 June 2013||
Jennifer Sinibaldi (IAB Nürnberg): Which is the better investment for nonresponse adjustment: Purchasing commercial auxiliary data or collecting interviewer observations?
Survey methodologists are searching for better covariates to use in nonresponse adjustment models, ultimately hoping to find variables that are highly correlated with both the outcome of interest and the propensity to respond. These covariates can come from auxiliary data which will provide information on both respondents and nonrespondents. Two such types of auxiliary data are interviewer observations (a form of paradata) and commercially available data on small areas or households. While interviewer observations can be designed to match the outcome variables of interest, they are limited to those outcomes. Commercial data, on the other hand, provide a broad set of small area, economic descriptors which may be correlated with multiple outcomes. This analysis examines these two data sources to determine which is more predictive of the outcomes of interest for a particular survey, thereby fulfilling one of the criteria for a good adjustment variable. The survey outcomes are self-reports of household income and receipt of unemployment benefits from a survey of labor market participation.
The findings suggest that interviewer observations are better at predicting the outcomes of interest, compared to commercial data, particularly in the subpopulation that the survey targets. Therefore, the observations share more (accurate) information with the true value, making them better for adjustment, on this dimension. The results will assist researchers wishing to improve their nonresponse adjustments, as well as survey managers by providing guidance as to which type of data, interviewer observations or commercial data, may be a better use of the survey budget.
|10 June 2013
nformal presentation and discussion on the history of the Department of Statistics
|17 June 2013||
Julia Plaß (LMU München): Coarse categorical data under epistemic and ontologic uncertainty
There are two different reasons for coarse data, namely epistemic and ontologic uncertainty. While in the former case precise data are coarsened in order to preserve the respondents' anonymity, data are coarse by nature in the latter case, e.g. for reasons of indecision. But although coarse data are widely present in this way, it is still an open topic how to deal with this kind of data.
Therefore, in this presentation, which is based on my master thesis, some methods that partly have been used in other areas exclusively until now will be investigated and applied in this context, where categorial coarse data will be regarded only. The concept of coarsening at random, partial identification and sensitivity analysis can be helpful in connection with the analysis of epistemic uncertainty, where the theory of random sets and the Dempster-Shafer theory will serve as the basis for dealing with ontologic uncertainty. Finally, it will be considered how a categorial dependent variable that is either coarse because of epistemic or ontologic uncertainty can be involved within a multinomial logit model.
|24 June 2013||
Paul Fink (LMU München): Entropy Based Decision Trees
One method for building classification trees is to choose split variables by maximising expected entropy. This can be extended through the application of imprecise probability by replacing instances of expected entropy with the maximum possible expected entropy over credal sets of probability distributions. Such methods may not take full advantage of the opportunities offered by imprecise probability theory. In this talk, the focus is changed from maximum possible expected entropy to the full range of expected entropy. An entropy minimisation algorithm using the non-parametric inference approach to multinomial data is presented. Furthermore an interval comparison method based on two user-chosen parameters, which includes previously presented splitting criteria (maximum entropy and entropy interval dominance) as special cases is presented. This method is then applied to 13 datasets, and the various possible values of the two user-chosen criteria are compared with regard to each other, and to the entropy maximisation criteria which the approach generalises.
Georg Schollmeyer (LMU München): On Sharp Identification Regions for Regression Under Interval Data
The reliable analysis of interval data (coarsened data) is one of the most promising applications of imprecise probabilities in statistics. If one refrains from making untestable, and often materially unjustified, strong assumptions on the coarsening process, then the empirical distribution of the data is imprecise, and statistical models are, in Manski's terms, partially identified. We first elaborate some subtle differences between two natural ways of handling interval data in the dependent variable of regression models, distinguishing between two different types of identification regions, called Sharp Marrow Region (SMR) and Sharp Collection Region (SCR) here. Focusing on the case of linear regression analysis, we shortly present some fundamental geometrical properties of SMR and SCR, allowing a comparison of the regions and providing some guidelines for their canonical construction. Relying on the algebraic framework of adjunctions of two mappings between partially ordered sets, we then characterize SMR as a right adjoint and as the monotone kernel of a criterion function based mapping, while SCR is indeed interpretable as the corresponding monotone hull. Finally we sketch some ideas on a compromise between SMR and SCR based on a set-domained loss function.
Marco Cattaneo (LMU München): On the Robustness of Imprecise Probability Methods
Imprecise probability methods are often claimed to be robust, or more robust than conventional methods. In particular, the higher robustness of the resulting methods seems to be the principal argument supporting the imprecise probability approach to statistics over the Bayesian one. The goal of the present work is to investigate the robustness of imprecise probability methods, and in particular to clarify the terminology used to describe this fundamental issue of the imprecise probability approach.
|12 August 2013
Informal presentation and discussion on the history of the Department of Statistics
Program (Wintersemester 2012/13, Mondays at 18:30):
|29 October 2012||
Michael Seitz (LMU München): Schätzung partiell identifizierter Parameter in generalisierten linearen Modellen mit Intervalldaten
Unsicherheit in Daten kann für stetige Variablen im Kontext der generalisierten linearen Regression als Intervalle dargestellt werden. Dadurch ist eine klassische Schätzung der Parameter im Allgemeinen nicht möglich, die Parameter sind ohne weitere Annahmen partiell identifiziert. Dennoch lässt sich die Menge der zulässigen Parameter mathematisch beschreiben. Für die praktische Bestimmung der Extrema dieser Menge muss ein nicht-lineares Optimierungsproblem gelöst werden. Hierzu werden verschiedene Ansätze vorgestellt und an einfachen Simulationsbeispielen untersucht.
|5 November 2012||
Marco Cattaneo (LMU München): Profile likelihood inference
Profile likelihood inference is an elegant, general, and integrated approach to the problems of estimating a quantity and evaluating the uncertainty of the resulting estimate. A new method for calculating the profile likelihood function in some nonparametric estimation problems is derived. The new method is illustrated by applications to the fundamental problem of practical statistics and to the problem of quantifying Bayesian network classifiers.
|26 November 2012||
Andrea Wiencierz (LMU München): Imprecise regression: A comparative study
Imprecise regression methods are regression methods that do not aim at a single estimated regression function but allow for a set-valued result. Sets of regression functions can be obtained from regression methods that are based on imprecise probability models, like the recently introduced Likelihood-based Imprecise Regression, but they can also be derived from classical regression methods, e.g., in the form of confidence bands for Least Squares regression. Hence, the question arises, which approach leads to better results.
To investigate this question, we will compare a selection of imprecise methods in the setting of simple linear regression. On the one hand the comparison will be based on general properties of the regression methods, like the coverage probability of the result or the robustness of the method. On the other hand, we will compare the performance in a practical setting, where we will consider the case of precise data as well as the case where the variables are only imprecisely observed.
Marco Cattaneo (LMU München): Imprecise probability for statistical problems: is it worth the candle?
In recent years, theories of imprecise probability have been suggested as alternative approaches to statistical problems. These approaches will be compared with the conventional ones (based on precise probability theory), both from a theoretical perspective, and from the pragmatical perspective of their application to the so-called "fundamental problem of practical statistics".
|17 December 2012||
Paul Fink (LMU München): Minimum Entropy Algorithm for NPI-generated probability intervals
As preliminary work to an entropy range based splitting criterion for imprecise classification trees, those entropy ranges need to be calculated. In case of an underlying imprecise Dirichlet model or an ordinal Nonparametric Predictive Inference model algorithms for both the maximal and minimal achievable entropy were already developed. For the multinomial NPI there are algorithms to calculate the maximum entropy, yet none for the minimum entropy. In this short talk the minimum entropy algorithm of the O-NPI is ported to the M-NPI case and some properties of it are discussed.
|28 January 2013||
Georg Schollmeyer (LMU München): A note on sharp identification regions
For many statistical models, there exists a close link between the problem of parameter estimation and the problem of prediction. For example in the case of a simple linear model, the "best linear predictor" and the "best linear estimator" are nearly the same with one main difference, that the linear predictor is also reasonable in a misspecified model, whereas the estimated parameters are only meaningful, if the model is correctly specified.
In this talk we try to analyze this link between prediction and estimation in the case of partially identified linear models. In a first step we assume, that we know the complete distribution of all observable variables and only have to "deduce" all not refutable parameter values or all in a sense reasonable predictions. This leads to different regions, that we call here sharp estimation region and sharp prediction region respectively. These regions will be our entities to estimate and, viewed as mappings, they will have very different algebraic properties. Furthermore they can be generated canonically and relatively independently of each other. Later on, we will introduce some constructions with set-valued mappings and show that both regions appear as the monotone kernel and the monotone hull of a criterion-function-based mapping.
Finally we give some ideas of possible ways to estimate these regions.
|4 February 2013
Program (Sommersemester 2012, Wednesdays at 18:30):
|2 May 2012||
Julia Kopf (LMU München): Heterogenität in IRT-Modellen
Ziel vieler empirischer Bildungsstudien ist die Messung latenter Eigenschaften wie zum Beispiel der Lesekompetenz. Hierfür werden statistische Modelle aus der Item Response Theorie (IRT) wie zum Beispiel das Rasch Modell herangezogen. Eine zentrale Annahme dieser Modelle ist die Eigenschaft der invarianten Itemparameter. Ist diese Annahme verletzt, liegt sogenanntes Differential Item Functioning (DIF) vor: Auch bei gleicher Fähigkeit weisen Gruppen von Befragten unterschiedliche Lösungswahrscheinlichkeiten für einzelne Aufgaben (Items) auf. Dies kann beim Vergleich von unterschiedlichen Personen-Gruppen (z.B. nach Geschlecht, Muttersprache) zu gravierenden Fehlschlüssen führen. Der Vortrag stellt Herausforderungen bei der Analyse von DIF im Rasch-Modell und erste Ansätze diese zu lösen vor.
|16 May 2012||
Roland Pöllinger (LMU München): Newcomb's Paradox — Wissen ordnen und erschließen in hybriden Netzen
In Rückbezug auf den Physiker William NEWCOMB stellt Robert NOZICK (1969) — wie er es nennt — Newcomb's Problem vor, ein entscheidungstheoretisches Dilemma, in welchem zwei Prinzipien rationalen Urteilens in Konflikt zu stehen scheinen, zumindest in einem Großteil der einschlägigen Literatur quer durch Statistik und Philosophie: Das Principle of Dominance und das Principle of Maximum Expected Utility empfehlen voneinander abweichende Strategien in der Spielsituation des Gedankenexperiments. Während Vertreter der Evidential Decision Theory (EDT) geteilter Meinung zur anzuwendenden Strategie und der grundsätzlichen Interpretation beider zu sein scheinen, tendiert die Literatur der Causal Decision Theory (CDT) mehrheitlich zur Lösung, die auch von Dominance empfohlen wird ("two-boxing").
In diesem Vortrag möchte ich die Modellierung des Paradoxons in Bayes'schen kausalen Modellen erläutern, wie sie von PEARL (1995 oder 2000/2009) definiert und von Wolfgang SPOHN ("Reversing 30 Years of Discussion: Why Causal Decision Theorists Should One-Box") bzw. MEEK & GLYMOUR (1994) zur Analyse von Newcomb's Problem herangezogen werden. Als Antwort auf diese Ansätze möchte ich im zweiten Teil meiner Diskussion meinen Lösungsvorschlag in Causal Knowledge Patterns (einer Erweiterung des Bayesnetz-Frameworks mit intensionalen Informationsbrücken) präsentieren, um schließlich — näher an der Intuition und der ursprünglichen Formulierung von NOZICKs Geschichte — bei der Lösung des "one-boxing" anzugelangen.
Keywords: (causal/evidential) decision theory, causal reasoning, epistemic causation, formal epistemology, Bayes nets, interventionist account of causation
|13 June 2012||Formal Informal: The Markov Assumption|
|27 June 2012||
Georg Schollmeyer (LMU München): Linear models and partial identification
In several areas of research like Economics, Engineering sciences, or Geodesy, the aim of handling interval-valued observations to reflect some kind of non-stochastic uncertainty is getting some attention. In the special case of a linear model with interval-valued dependent variables and precise independent variables one can use the linear structure of the least-squares-estimator to develop an appropriate, now set-valued estimator, which is explicated seemingly independently in several papers (Beresteanu and Molinari, 2008; Schön and Kutterer, 2005; Cerny, Antoch, and Hladik, 2011).
The geometric structure of the so reached estimate is that of a zonotope, which is widely studied in computational geometry. In this talk I want to introduce the above-mentioned estimators, some of their properties, and two different ways to construct confidence regions for them: One way is to look at these estimators as set-valued point estimators and to utilize random set theory, the other way is to see them as collections of point estimators, for which one has to find appropriate collections of confidence ellipsoids. Finally I want to give a short outlook on an idea to make this zonotope-type-estimators a little bit more robust.
|5 July 2012 (Thursday)||
Robert Hable (Universität Bayreuth): Robustness versus consistency in ill-posed statistical problems
There are a number of properties which should be fulfilled by a statistical procedure. First of all, it should be consistent, i.e., it should converge in probability to the true value for increasing sample sizes. Another crucial property is robustness, i.e., small model violations (particularly caused by small errors in the data) should not change the results too much. It is well-known from parametric statistics that there can be a goal conflict between efficiency and robustness. However, in many nonparametric statistical problems, there is even a goal conflict between consistency and robustness. That is, a statistical procedure which is (in a certain sense) robust cannot always converge to the true value. This is the case for so-called ill-posed problems. It is well-known in the machine learning theory that many nonparametric statistical problems are ill-posed but the consequences concerning robustness have not received much attention so far. Here, we bring together notions and facts which are common in different fields, namely robust statistics and machine learning. As an example, we consider nonparametric classification and regression by use of regularized kernel methods such as support vector machines (SVM).
|12 July 2012 (Thursday)|
|7 September 2012
(Friday) at 14:30
Samira Sedighin (Isfahan University of Technology): Detection and Deliberation of Outliers in Fuzzy Regression Models
In the fuzzy linear regression introduced by Tanaka in 1982, some of the strict assumptions of the statistical model are relaxed. In the general fuzzy regression model the input data (explanatory data x) and the output data (dependent data y) are fuzzy, the relationship between the input and output data is given by a fuzzy function and the distribution of the data is Possibilistic.
They don't need to have statistical properties. Therefore; the fuzzy regression analysis should be applied to many real life problems in which the strict assumptions of classical statistical regression analysis can't be met. Possibilistic regression was introduced and suggested first time by Tanaka and his co-workers. In this thesis we explain this regression and estimated models in the case that the coefficients are fuzzy and observed outputs are fuzzy or non fuzzy.
Outliers are sometimes occurred because of big errors during the collection, recording or transferring data. Sometimes they are correct observations that show inadequacy of model. When an outlier is detected, it should be investigated. It should not be automatically omitted while continuing the analysis. If outliers are serious observations, they prove inadequacy of the model. Usually they provide valuable keys for analyzer to make better model. It is important for analyzer to detect outliers and investigate their effect on different features of analysis.
One of the drawbacks of Tanaka's model is that it is sensitive to outliers. This sensitivity causes the predicted intervals to become wide which is not desired. Therefore some methods are presented to solve this problem.
One method is to introduce a new variable and construct a fuzzy linear programming problem with fuzzy intervals and obtain reasonable estimate intervals. In this case, estimates are not affected by outliers and effect of outliers will be omitted. This means that all data's influence on estimated interval not just outliers.
Another method is to add some additional constraints to the main problem's constraints and detect outliers and modify the constraints of outliers. In this case the effect of outliers will be omitted. However there are some drawbacks in these methods. For example they have to already determine some values for parameters.
To overcome the drawbacks, an omission approach that investigates the value changes in the objective function when each observation is omitted is used. Therefore; a method for detecting outliers is presented by eliminating every observation, its effect on objective function in linear programming problem is investigated and the outlier is detected. In addition, a box plot to define the cutoffs for detecting outliers is used. A certain diagnostic measure is used to see the effect of one observation on the objective function. Then the concentration is on the biggest one. Therefore a box plot is used to determine whether the biggest measure is an outlier or not.
The other problem by Tanaka's model is that when the center trend is conflicting with the spread trend, Tanaka FLR's method has inappropriate results. In this case the spread's trend is not correctly estimated. The reason is due to the sign of the spreads that is assumed positive. By using a new approach called UFLR that eliminates the spread sign's constraint in linear programming problem, the drawback is solved.
Program (Wintersemester 2011/12, Thursdays at 18:15):
|20 October 2011||
Gero Walter (LMU München): On shapes of parameter sets defining sets of conjugate priors in generalized Bayesian inference
Imprecise Bayesian inference aims to generalize and robustify standard Bayesian inference by considering sets of priors instead of a single prior distribution to model (possibly vague) prior information.
Due to tractability requirements, one often resorts to conjugate priors, where the posterior is from the same distribution family as the prior, and so the update step from prior to posterior distribution can be characterized by the change of parameter values.
A central benefit of the imprecise Bayesian approach is that it is able to mirror the quality, or precision, of prior information by the magnitude of the set of priors. Perfect probabilistic knowledge yields a precise prior, whereas vague knowledge can be expressed by a large set of priors. This carries forward to the posterior set, which is reduced in magnitude when more and more data points are used for updating. This generally desirable behaviour should, however, hold only if prior and data information are in accordance (or the prior is overridden by vast amounts of data). Whenever there is a situation of "prior-data conflict" instead, this should inflate the posterior set as compared to the non-conflicting case, thus signalizing conflict by giving more cautious posterior inferences.
In imprecise Bayesian inference with conjugate priors as presented in Walley (1991, § 5.4.3) for Bernoulli data, and in the generalization to data from exponential family distributions in Walter and Augustin (2009), the set of priors is characterized by an interval for the pseudocounts parameter n and an interval for the main interest parameter y (for y one-dimensional) of the conjugate prior. This makes the description of the prior set very easy, and leads to simple updating rules with respect to prior-data conflict. Such rectangular prior sets may, however, for several reasons, not be a good representation of prior beliefs, as this set shape poses considerable constraints on the set of priors. I would like to point to several issues arising with rectangular prior sets and explore some ideas about more flexible descriptions of parameter sets.
|17 November 2011||
Marco Cattaneo (LMU München): Robust regression with imprecise data
We consider the problem of regression analysis with imprecise data (meaning imprecise observations of precise quantities in the form of sets of values). Without distributional assumptions, a likelihood-based approach to this problem leads to a very robust regression method (which can be interpreted as a generalization of the method of least median of squares). We compare this method with other approaches to regression with imprecise data, and apply it to data from a social survey.
Technical report: Robust regression with imprecise data
|24 November 2011||
Bernhard Haller (TU München): Regression models for failure time data in the presence of competing risks
In the analysis of failure time data the time to one certain out of many possible events may be of interest e.g. in clinical research or engineering. In the presence of so called competing risks, the joint distribution of times to different types of event cannot be estimated from the observed data without making unverifiable assumptions, since only time to the first event can be observed for each subject. Furthermore, standard failure time methods as the "naïve" Kaplan-Meier estimator, treating competing events as censored observations, lead to biased results and relationships between hazard rates and event probabilities known from classical survival analysis do no longer hold.
In the last three decades a variety of measures and methods for the description and analysis of competing risks data were introduced. In my talk I will present the most common measures used in the analysis of competing risks data and show pitfalls and problems present in the analysis of failure time data with mutually exclusive types of event. I will focus on different regression modelling approaches proposed in statistical literature. In the presence of competing risks, assessment of covariate effects is not straightforward. On the one hand, regression modelling approaches using different versions of hazard rates as dependent variables were introduced (Prentice et al., 1978; Fine and Gray, 1999), on the other hand, approaches based on factorizations of the joint distribution of event times and types of event were proposed (Larson and Dinse, 1985; Nicolaie et al., 2010). I will discuss the different approaches focusing on assumptions, applicability and interpretation of the results. All measures and models will be illustrated using data from clinical practice.
|1 December 2011||
Petra Wolf (TU München): Predictive accuracy in survival analysis: The ROC curve and related measures
Receiver Operating Characteristic (ROC) curves are widely used to evaluate and compare diagnostic tests in case control studies. To evaluate a prognostic marker in survival analysis the concept of ROC curves has to be extended. Heagerty (2000) and Heagerty and Zheng (2005) introduced a definition of time dependent sensitivity and specificity. Following this approach, I will present an enhanced method to calculate ROC curves in the setting of censored data.
Furthermore I will compare the ROC methodology with other measures for predictive accuracy in survival analysis: besides the classical concept of using the area under the ROC curve (AUC) as a measure of predictive accuracy there exist related methods as the C-Index and a new proposal the integrated discrimination index (IDI) to compare prognostic markers. In my talk I will show some similarities and also differences between these concepts.
|8 December 2011||
Manuel Eugster (LMU München): Reproduzierbare Forschung — Wieso? Weshalb? Warum? Und Wie?
Reproduzierbarkeit — in der Wissenschaft — bedeutet die Wiederholbarkeit von Experimenten, Analysen und Ergebnissen. Auch in den computationalen Wissenschaften spielt Reproduzierbarkeit eine große Rolle. Trotz einer "gefühlt einfach zu erreichenden Reproduzierbarkeit" (Daten und Source Code zur Verfügung stellen) sind nur sehr wenige Publikationen in allen Schritten vollständig nachvollziehbar.
In diesem Vortrag möchte ich diesen Umstand diskutieren. Ich stelle die Vorzüge von Reproduzierbarkeit dar und präsentiere Beispiele zur Verbesserung der Forschung im Kleinen (die eigenen Forschung) und im Großen (der Forschungsbetrieb allgemein). Ich diskutiere Probleme und Gegenargumente bei der Forderung nach Reproduzierbarkeit und präsentiere abschließend meinen Ansatz, meine Forschung reproduzierbar zu machen.
|15 December 2011
Thomas Augustin (LMU München): Imprecise measurement error models and partial identification: towards a unified approach for non-idealized data
Some first steps towards a generalized, unified handling of deficient, nay non-idealized, data are considered. The ideas are based on a more general understanding of measurement error models, relying on possibly imprecise error and sampling models. This modelling comprises common deficient data models, including classical and non-classical measurement error, coarsened and missing data, as well as neighbourhood models used in robust statistics. Estimation is based on an eclectic combination of concepts from Manski's theory of partial identification and from the theory of imprecise probabilities. Firstly, measurement error modelling with precise probabilities is discussed, with an emphasis on Nakamura's method of corrected score functions and some extensions. Secondly, error models based on imprecise probabilities are considered, relaxing the rather rigorous assumptions underlying all the common measurement error models. The concept of partial identification is generalized to estimating equations by considering sets of potentially unbiased estimating functions. Some properties of the corresponding set-valued parameter estimators are discussed, including their consistency (in an appropriately generalized sense). Finally, the relation to previous work in the literature on partial identification in linear models is made explicit.
Marco Cattaneo (LMU München): On the implementation of Likelihood-based Imprecise Regression
Likelihood-based Imprecise Regression (LIR) is a new approach to regression allowing the direct consideration of any kind of coarse data (including e.g. interval data, precise data, and missing data). LIR uses likelihood-based decision theory to obtain the regression estimates, which are in general imprecise, reflecting the uncertainty in the coarse data. Here, we address in particular the implementation of LIR, focusing on some important regression problems. From the computational point of view, the possible non-convexity of the estimated set of regression functions poses a considerable challenge.
Gero Walter (LMU München): Generalised Bayesian inference with conjugate priors, and a link to g-priors for Bayesian model selection
In generalised Bayesian inference, sets of priors are considered instead of a single prior, allowing for partial probability specifications and a systematic analysis of sensitivity to the prior. Especially when substantial information is used to elicit the prior, prior-data conflict can occur, i.e., data that are very unlikely from the standpoint of the prior may be observed. This conflict should show up in posterior inferences, alerting the analyst and, e.g., lead to a revision of prior specifications. However, when conjugate priors are used, a reasonable reaction is not guaranteed. Mostly, prior-data conflict is just averaged out, and in Bayesian regression, conflict in one regressor leads only to a non-specific reaction across all regressors. Generalised Bayesian inference can amend this behaviour by encoding the precision of inferences via the magnitude of the posterior set. The simplified natural conjugate prior most suited for generalised Bayesian regression has a link to the so-called g-prior, which is used for model selection in classical Bayesian regression.
|12 January 2012||
Georg Schollmeyer (LMU München): Necessity-measures and their Möbius inverses in the framework of generalized coherent previsions
In this talk we investigate necessity-measures as special coherent previsions. The motivation for focusing on necessity-measures is a closedness property under a general construction of hierarchical models. To describe this effectively we introduce generalized coherent lower previsions as a framework. Later on we will use the Möbius inversion to find an effective algorithm for calculating the extreme points of the core of a necessity-measure as well as an exact equation for the number of the extreme points. The algorithm can be used to make calculations of unknown total and conditional previsions of the hierarchical model.
Finally, again with the use of the Möbius inversion, we show that there are no non-trivial infimum-preserving coherent lower previsions. Thus, in general only the trivial necessity-meaures are closed under the above-mentioned construction. This may put in question either the conventional generalization of classical necessity-logic to necessity-theory or the appropriateness of this construction or parts of it, like the focus on coherence and natural extension.
|26 January 2012||
Informal presentation and discussion of ongoing research work
Program (Sommersemester 2011, Thursdays at 18:15):
|14 April 2011||
Christian Seiler (ifo Institut München): Micro data Imputation and Macro data Implications - Evidence from the Ifo Business Survey
Surveys are commonly affected by nonresponding units which can produce biases if these missing values can not be regarded as missing at random (MAR). As many papers examined the effect of nonresponse in individual or household surveys, only less is done in the case of business surveys. This paper analysis the missing data in the Ifo Business Survey, which most prominent result is the Ifo Business Climate Index, a leading indicator for the businss cycle development in Germany. The missing values are imputed using various imputation approaches for longitudinal which reflect the underlying latent data generating process. After this, the data is aggregated and compared with the original indices to evaluate their implications to the macro level.
|28 April 2011||
Andreas Ströhle (LMU München): Über den Begriff "Zufall" aus ontologischer Perspektive
Sowohl im Alltag als auch im zeitgenössischen Wissenschaftsbetrieb gilt es als Allgemeingut, dass es in unserer Welt zu Zufällen kommt. In der Wissenschaft gab es hierbei aufgrund der experimentellen Ergebnisse der Quantenphysik einen Paradigmenwechsel zu Beginn des 20. Jahrhunderts — in den Jahrhunderten davor galt der Determinismus als common sense unter Wissenschaftlern bzw. Naturphilosophen. Heutzutage wird in der Wissenschaft zwar in der Regel zwischen "relativen" und "absoluten" Zufällen unterschieden, jedoch wird dabei nicht weiter hinterfragt, ob das Konzept des absoluten Zufall aus ontologischer Perspektive überhaupt sinnvoll ist, sondern dessen ontisch reales Vorkommen wird als selbstverständlich vorausgesetzt.
In meinem Vortrag zeige ich, dass das Konzept des Zufalls in seiner absoluten Form mit schwerwiegenden ontologischen Problemen belastet ist, aufgrund deren es ratsam erscheint, den Begriff "Zufall" ausschließlich als epistemisches Konzept zu betrachten.
|5 May 2011||
Roland Pöllinger (LMU München): Strukturen kausalen Wissens
Die Frage nach der Ontologie oder der Beschreibung von Kausalzusammenhängen kreist immer wieder um das Verhältnis von Determinismus und (ontologischem oder deskriptiven) Indeterminismus innerhalb einer Theorie der Verursachung. Einige prominente Ansätze gründen die Kausalanalyse allein auf Korrelationen oder auf geeignet eingeschränkte Anforderungen an statistische Abhängigkeiten. Etliche Fallstricke und kontraintuitive Gegenbeispiele vereiteln allerdings dieses Vorhaben und erzwingen weitere Methodenverfeinerung. Judea PEARL (2000/2009) basiert seinen Analyseansatz auf die Abbildung von statistischen Abhängigkeiten in so genannten Bayesnetzen und formuliert seinen deterministischen Kausalbegriff mithilfe von systematischen, strukturellen Manipulationen solcher Bayesnetze.
Der Vortrag soll als "Work-in-Progress" auf das Format der Bayesnetze eingehen, automatische Verfahren der Netzgenerierung vorstellen und dies in den Zusammenhang mit einem deterministischen Verständnis von Kausalität bringen, welches im Kern (notwendigerweise) epistemisch ausgerichtet ist.
Empfehlung: Pearl, Judea. Causal diagrams for empirical research. Biometrika, 1995, 82, 669-688.
|12 May 2011||
Martin Gümbel (München): About the probability-field-intersections of Weichselberger and a simple conclusion from least favorable pairs
In the frame of probability theory of Weichselberger there are probability fields and operations on probability fields. We look at the probability-field-intersection and present a simple conclusion for this operation, if there exists a least favorable pair of probabilities.
|26 May 2011||
Ulrich Pötter (LMU München): Sampling Extended Household and Family Networks and the Use of Simplicial Complexes
One possible solution to the problem of computing inclusion probabilities for families from surveys of individuals is based on counting formulae derived from the theory of simplicial complexes. Here simplicial complexes are seen as unions of sets of subsets subject to certain constraints. After a sketch of this solution I would like to discuss some further areas of statistical applications of these techniques.
|9 June 2011||
Informal presentation and discussion of ongoing research work
|16 June 2011||
Manfred Schramm (München): Schließen mit Wahrscheinlichkeiten und maximaler Entropie: Theorie, Implementierung, Anwendung
Aussagen über Häufigkeiten bestimmter Phänomene bilden eine einfache, leicht verständliche und sehr verbreitete Art der Information. Versuchen wir allerdings, auf der Basis von Häufigkeitsinformationen Entscheidungen zu treffen, werden wir überrascht von der Vielzahl der Möglichkeiten, die unsere Informationen noch offen lassen. Eine "Logik" auf Basis von Häufigkeiten wird daher nur in seltenen Fällen Folgerungen bzw. Entscheidungen unterstützen können. Durch welche zusätzlichen Prinzipien lässt sich dem begegnen? Der Vortrag zeigt in Theorie und an praktischen Beispielen, wie sich die Prinzipien der Indifferenz, der Unabhängigkeit und der maximalen Entropie gegenseitig stützen und mit den Häufigkeitsinformationen zu einem leistungsfähigen wissensbasierten System ergänzen. Der praktische Einsatz eines solchen Systems wird anhand einer medizinische Anwendung erläutert.
|7 July 2011||
Informal presentation and discussion of ongoing research work
|14 July 2011||
Andrea Wiencierz (LMU München): Regression with Imprecise Data: A Robust Approach
We introduce a robust regression method for imprecise data, and apply it to social survey data. Our method combines nonparametric likelihood inference with imprecise probability, so that only very weak assumptions are needed and different kinds of uncertainty can be taken into account. The proposed regression method is based on interval dominance: interval estimates of quantiles of the error distribution are used to identify plausible descriptions of the relationship of interest. In the application to social survey data, the resulting set of plausible descriptions is relatively large, reflecting the amount of uncertainty inherent in the analyzed data set.
Gero Walter (LMU München): On Prior-Data Conflict in Predictive Bernoulli Inferences
By its capability to deal with the multidimensional nature of uncertainty, imprecise probability provides a powerful methodology to sensibly handle prior-data conflict in Bayesian inference. When there is strong conflict between sample observations and prior knowledge the posterior model should be more imprecise than in the situation of mutual agreement or compatibility. Focusing presentation on the prototypical example of Bernoulli trials, we discuss the ability of different approaches to deal with prior-data conflict.
We study a generalized Bayesian setting, including Walley's Imprecise Beta-Binomial model and his extension to handle prior data conflict (called pdc-IBBM here). We investigate alternative shapes of prior parameter sets, chosen in a way that shows improved behaviour in the case of prior-data conflict and their influence on the posterior predictive distribution. Thereafter we present a new approach, consisting of an imprecise weighting of two originally separate inferences, one of which is based on an informative imprecise prior whereas the other one is based on an uninformative imprecise prior. This approach deals with prior-data conflict in a fascinating way.
|21 July 2011
in room 225
Jan-Willem Romeijn (Rijksuniversiteit Groningen): Frequencies, Chances, and Undefinable Sets
In this talk I aim to clarify the concept of chance. The talk consists of two parts, concerning the epistemology and metaphysics of chance respectively. In the first part I consider statistical hypotheses and their role in inference. I maintain that statistical hypotheses are best explicated along frequentist lines, following the theory of von Mises. I will argue that the well-known problems for frequentism do not apply in the inferential context.
In the second part of the talk I ask what relation obtains between these frequentist hypotheses and the world. I will show that we can avoid the problem of the reference class, as well as the closely related conflict between determinism and chance, by means of a formal antireductionist argument: events can be assigned meaningful and nontrivial chances if they correspond to undefinable sets of events in the reducing theory.
|29 July 2011
(Friday) at 15:15
Teddy Seidenfeld (CMU Pittsburgh): Three contrasts between two senses of "coherence"
B. de Finetti defended two senses of "coherence" in providing foundations for his theory of subjective probabilities. Coherence_1 requires that when a decision maker announces fair prices for random variables these are immune to a uniform sure-loss — no Book is possible using finitely many fair contracts! Coherence_2 requires that when a decision maker's forecasts for a finite set of random variables are evaluated by Brier Score — squared error loss — there is no rival set of forecasts that dominate with a uniformly better score for sure. De Finetti established these two concepts are equivalent: fair prices are coherent_1 if and only if they constitute a coherent_2 set of forecasts if and only if they are the expected values for the variables under some common (finitely additive) personal probability.
I report three additional contrasts between these two senses of "coherence". One contrast (relating to finitely additive probabilities) favors coherence_2. One contrast (relating to decisions with moral hazard) favors coherence_1. The third contrast relates to the challenge of state-dependent utilities.
Program (Wintersemester 2010/11, Wednesdays at 18:30):
|3 November 2010||
Atiye Sarabi Jamab (LMU München): An experimental comparative study of the performance of uncertainty measures in Dempster-Shafer theory
In Dempster-Shafer theory, it is distinguished between two types of uncertainty: conflict is associated with cases where the information focuses on sets with empty intersections, and non-specificity is associated with cases where the information focuses on sets where the cardinality is greater than one. Several criteria for measuring the conflict or non-specificity or both of them are proposed in the literature, and could be used for measuring the uncertainty in Dempster-Shafer theory, but might not use all the information of the bodies of evidence. The aim of this talk is to compare the behaviour of some of them as a "distance" between two basic probability assignments.
|15 November 2010
(Monday) at 14:15
Philipp Bleninger (IAB Nürnberg): Remote Data Access und Enthüllungsrisiken für sensible Informationen aus inferentiellen Datenangriffen
Zahlreiche Stellen der öffentlichen Verwaltung (die Statistischen Ämter, die Bundesagentur für Arbeit, die Deutsche Rentenversicherung etc.) produzieren große Mengen an Daten, die auch für die Forschung von großem Interesse sind. Allerdings können die Datenproduzenten ihre Daten nicht einfach weitergeben, sondern müssen besondere Vorgaben hinsichtlich des Datenschutzes und der Anonymität wahren (gemäß BStatG, SGB X etc.). Der gegenwärtige Standard für die Weitergabe von Daten besteht entweder im On-Site Access oder in der Datenveränderung. Aber On-Site Access ist sehr aufwendig sowohl für den Datennutzer als auch für den Datenproduzenten und veränderte Daten haben eine geringe Akzeptanz der Nutzer insbesondere was Inferenzen betrifft.
Auf der Suche nach geeigneten Datenzugängen scheint der Remote Access eine vielversprechende Lösung zu sein. Remote Access wird über einen Server gestattet, auf dem der Datennutzer arbeitet. Er bekommt dabei entweder gar keine oder nur verfremdete Daten zu sehen, während seine Analysen auf den wahren Daten gerechnet werden. Dennoch ist auch dieser Datenzugang nicht sicher, da die Enthüllung sowohl ganzer Datenvektoren, als auch individueller Informationen möglich ist.
Dieser Vortrag beschäftigt sich mit den Risiken für Datenenthüllung im Remote Access am Beispiel des IAB Betriebspanels. Die Möglichkeiten eines Datenangreifers für inferentielle Datenenthüllung werden anhand der Hauptkomponenten-/Faktorenanalyse und der einfachen linearen Regression aufgezeigt. Diese beiden eigentlich leicht zu verhindernden Risikoquellen stehen dabei beispielhaft für die mannigfaltigen Möglichkeiten einfallsreicher Datenangreifer.
|19 November 2010
(Friday) at 14:15
in room 144
Ric Crossman and Frank Coolen (Durham University): Nonparametric predictive inference for ordinal data: multiple comparisons and classification
Nonparametric Predictive Inference (NPI) is a statistical method which uses few modelling assumptions, enabled by the use of lower and upper probabilities to quantify uncertainty. NPI has been presented for many problems in Statistics, Risk and Reliability and Operations Research. The first part of the presentation will give an informal introduction into basic ideas of NPI and its use for ordered categories, followed by some examples on multiple comparisons. The second part will focus on applications of NPI in classification trees.
|17 January 2011
|Informal presentation and discussion of ongoing research work|
|4 February 2011
(Friday) at 14:15
in room 144
Uwe Saint-Mont (FH Nordhausen): Statistik, empirische Wissenschaften und Wissenschaftstheorie
Die genannten Gebiete entwickeln sich zur Zeit eher unabhängig voneinander. Dies war jedoch nicht immer so: Noch vor einigen Jahrzehnen bestand eine enge Verbindung zwischen der Statistik und den sie umgebenden Feldern, insbesondere personifiziert durch R.A. Fisher.
Der Vortrag wird zwei Fragen thematisieren: Warum hat sich die Statistik seitdem mehr und mehr isoliert und wie ließe sich diese Entwicklung wieder umkehren? Neben der generellen "wissenschaftstheoretischen" Ausrichtung der Statistik spielt hierbei insbesondere der Begriff und der Umgang mit "Information" eine wichtige Rolle.
|11 February 2011
(Friday) at 14:15
Gerhard Winkler (Helmholtz Zentrum München): Zufall! Zufall?
In diesem Vortrag beschäftigen wir uns mit dem Zufall. Die Mathematik lassen wir dabei beiseite. Es geht vielmehr um eine Diskussion der Entwicklung und der vielfältigen Facetten dieses Begriffes.
In der Tat zählt der Zufall zu den oft mißverstandenen fundamentalen Begriffen. Dies gilt vom menschlichen Alltag angefangen bis in die verschiedensten Zweige der Wissenschaft hinein. Wir versuchen, dem entgegenzuwirken.
Der Vortrag ist auch für Nichtmathematiker bzw. -Statistiker geeignet.
|4 March 2011
(Friday) at 14:15
Sara Kleyer (Universität Bamberg): Regionaler Preisindex
Ziel der Preisstatistik ist es, die Preisentwicklung abzubilden und damit die Inflation zu messen. Neben dieser zeitlichen Perspektive ist es jedoch insbesondere für die Sozial- und Wirtschaftswissenschaften von Interesse, auch regionale Vergleiche ziehen zu können. Beispielsweise sind Lohnschätzungen solange verzerrt, wie sie nicht um das regionale Preisniveau bereinigt werden können. Die amtliche Statistik in Deutschland bietet aber nur als tiefste Gliederungsebene Preisindizes für die Bundesländer an, was bei weitem nicht ausreicht. Um regionale Preisindizes zu bestimmen, sind verschiedene statistische Methoden denkbar, welche vorgestellt werden sollen.
Program (Sommersemester 2010, Wednesdays at 19:15):
|5 May 2010
19 May 2010
The talk ventilates some first steps towards a generalized, unified handling of deficient, nay non-idealized, data. The ideas are based on a more general understanding of measurement error models, relying on possibly imprecise error and sampling models. This modelling comprises common deficient data models, including classical and non-classical measurement error, coarsened and missing data, as well as neighbourhood models used in robust statistics. Estimation is based on an eclectic combination of concepts from Manski's theory of partial identification and from the theory of imprecise probabilities.
(Not only) as a preparation, the first part of the talk discusses measurement error modelling with precise probabilities. After a brief introduction into the background, I consider one of the most general methods to correct for classical measurement error, namely Nakamura's method of corrected score functions. It is shown how this method to construct unbiased estimating functions under measurement error can be extended to deal with other types of error models, in particular with deficient dependent variables and with the so-called Berkson error.
The second part of the talk extends consideration to imprecise probabilities, relaxing the rather rigorous assumptions underlying all the common measurement error models. The concept of partial identification is extended to estimating equations by considering sets of potentially unbiased estimating functions. Some properties of the corresponding set-valued parameter estimators are discussed, including their consistency (in an appropriately generalized sense). Finally, the relation to previous work in the literature on partial identification in linear models is made explicit.
|12 May 2010||
Generalized iLUCK-models, a model introduced by Walter and Augustin (2009) as an imprecise probability generalization of conjugate Bayesian inference, have the advantage of an adaptive reaction to prior-data conflict. Whereas standard conjugate Bayesian inference is not necessarily sensitive to conflicts between prior and data, generalized iLUCK-models lead to much more cautious inferences if prior and data are in conflict. In this talk, the case of data trickling in as separate portions of observations is investigated, and a number of ideas related to sequential updating of the prior are presented, exploiting the sensitivity to prior-data conflict that generalized iLUCK-models offer. "Strong happiness" is a concept using these ideas for a simple sample size calculation, guaranteeing a certain precision with the possibility of prior-data conflict factored in.
|2 June 2010||
Marco Cattaneo (LMU München): Independence and Combination of Belief Functions
In belief functions theory, information about an uncertain value is described by a random set, and not by a random variable. We shall discuss some ideas about the interpretation of belief functions and the fusion of dependent information.
|28 June 2010
Christina Schneider (LMU München): Randomness Does Not Exist
Die Kenner werden feststellen, dass dieser Titel sich an de Finettis Diktum "Probability does not exist" anlehnt. Während de Finetti daraus den Schluss zieht, Wahrscheinlichkeit sei subjektivistisch zu interpretieren, wird dieser Weg nicht beschritten werden.
Zunächst wird, im Rahmen einer "objektivistischen" Inferenzschule zu dieser These hingeführt — hierzu sind einige wissenschaftstheoretische Überlegungen nötig — und dann werden einige Konsequenzen dieser Hinführung gezogen. Die wichtigste Konsequenz ist, dem "Wahrheitsanspruch" von Wahrscheinlichkeiten bzw. Wahrscheinlichkeitsaussagen zu entsagen.
Die positive Konsequenz ist pragmatischer Natur: Statistische Inferenz als Inference to the Best (Idealized) Description aufzufassen.
|30 June 2010||
Andrea Wiencierz (LMU München): The course of well-being over the life span — Restricted Likelihood Ratio Testing (RLRT) in the presence of correlated errors
Tests for zero variance components in general form linear mixed models (LMMs) have been established for different cases where the errors are assumed to be independent and identically distributed (i.i.d.). These tests can be applied to many interesting questions in practice. They allow, for example, to test if a relation between two variables is significantly different from a polynomial of a given degree.
However, in many real applications the independence of the errors is not given. For example in economic applications the errors are often positively autocorrelated. In the case of the ordinary linear model, there is a simple transformation technique to take the correlation into account, known to econometricians as Generalized Least Squares (GLS) transformation.
Motivated by an economic study about the course of subjective well-being over the life span, the transformation technique is adapted to the case of general form LMMs, and it is investigated if this transformation technique can be used for expanding the application areas of the established tests for zero variance components to the case of correlated errors.
|7 July 2010||
Hansjörg Baurecht (LMU München): Detecting Signals in Genomewide Association Studies
Genomewide data which are collected to detect statistical associations between SNPs and complex traits are usually analyzed by univariate testing of each SNP with the trait. To account for the large number of significance tests carried out, a very stringent p-value is used. This reduces occurrence of false positives, but it may cause many real associations to be missed. I will discuss an idea to incorporate the consideration of a region of SNPs where each single SNP does not pass the detection threshold. But by aggregating them so far undetected associated regions might be discovered. Therefore I adopted the idea of kernel smoothing to calculate a combined statistic incorporating the genetic distance and the linkage disequilibrium.
|19 July 2010
Julia Kopf (LMU München): Reflecting methods from machine learning with respect to their application in social science, psychology and statistics
Some ideas about the application and interpretation of methods from machine learning like recursive partitioning or association rules are presented. The main focus of the talk lies on the statistical validation of Ockham's Razor using model-based recursive partitioning.