| Title: | Partition-of-Unity Copula Fitting and Synthesis for R |
|---|---|
| Description: | Fit multivariate distributions using a Partition-of-Unity copula dependence structure, estimate marginals, and generate synthetic data with factor pre/post-processing. |
| Authors: | Andreas Mändle [aut, cre] |
| Maintainer: | Andreas Mändle <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-10 09:13:05 UTC |
| Source: | https://github.com/amaendle/PUcopulaSynth |
Fits logspline marginals for numeric/ordered variables, and empirical probability tables for binary or trivial variables. Optional k-NN smoothing is applied to numeric columns.
estimateMarginals( data, method = "spline", k = NULL, lbound = NULL, ubound = NULL )estimateMarginals( data, method = "spline", k = NULL, lbound = NULL, ubound = NULL )
data |
Preprocessed data.frame |
method |
Character; for numeric and ordered factors (currently "spline") |
k |
Numeric scalar, vector, or named list for k-NN smoothing |
lbound |
lower boundary of distribution passed to |
rbound |
upper boundary of distribution passed to |
A named list of marginal models (each element has qfun)
Fits a PUcopula model on a (preprocessed) data matrix with optional rank-binning and jitter for numeric variables.
fitPUcopula( data, driver_strength_factor = 0.5, bin_size = 3, jitter = FALSE, family = "binom" )fitPUcopula( data, driver_strength_factor = 0.5, bin_size = 3, jitter = FALSE, family = "binom" )
data |
Preprocessed data.frame (e.g., |
driver_strength_factor |
Numeric scalar or vector in (0,1\] used to scale rows per variable |
bin_size |
Numeric scalar, vector, or named list with bin sizes |
jitter |
FALSE, numeric (single) or named list mapping variables to jitter factors |
family |
PUcopula family, e.g. "binom" or "nbinom" |
A PUcopula::PUCopula model
Combines a fitted PUcopula and marginal models to produce a synthetic data.frame. Optionally restores original factor structure, names and classes.
generateSynthetic( n, copula, marginals, original_levels = NULL, original_varnames = NULL, original_classes = NULL )generateSynthetic( n, copula, marginals, original_levels = NULL, original_varnames = NULL, original_classes = NULL )
n |
Integer, number of rows to generate |
copula |
A PUcopula model |
marginals |
List of marginals from |
original_levels |
Optional |
original_varnames |
Optional vector of original column names |
original_classes |
Optional named vector of original classes |
Synthetic data.frame
k-NN smoother for numeric vectors (no DataSHIELD thresholds)
knnsmoother(x, k = 3)knnsmoother(x, k = 3)
x |
Numeric vector (NAs allowed). |
k |
Integer neighbours in \[1, N-1] for non-missing values. |
Numeric vector of same length with smoothed non-missing entries.
Restore original factor structure after synthesis
postprocessData(data, cat_dummy_levels)postprocessData(data, cat_dummy_levels)
data |
A data.frame containing dummy/encoded columns |
cat_dummy_levels |
The |
A data.frame with factors restored and columns ordered like input
Converts multi-level unordered factors to dummy variables and tags remaining factor columns with '.oriname' while storing levels.
preprocessData(data)preprocessData(data)
data |
A data.frame |
A list with data (processed) and original_levels (dummies + oriname)
Save original names and classes
save_original_varnames(data) save_original_classes(data)save_original_varnames(data) save_original_classes(data)
data |
A data.frame |
Character vector (names) / Named character vector (classes)
Draw from a fitted PUcopula
simulateCopula(model, n)simulateCopula(model, n)
model |
A PUcopula model from |
n |
Number of rows |
Matrix of U(0,1) draws