https://duckdb.org/docs/stable/core_extensions/overview Background The Transaction Processing Performance Council (TPC) is a non-profit that defines database benchmarking standards. TPC benchmarks like TPC-H and TPC-DS are industry-standard big-data benchmarks. These benchmarks are generated data using the dbgen or dsgen tools. These tools, while free, require registering an email and compiling it with gcc. As an alternative, however, you could use DuckDB which includes TPC-DS and TPC-H generating extensions by default as part of their core extensions.
http://www.asasrms.org/Proceedings/y2023/files/HB_JSM_2023.pdf https://ssc.ca/sites/default/files/survey/documents/SSC2003_R_Belcher.pdf Background The Hidiroglou‑Berthelot method, or HB‑edit, was introduced by Hidiroglou and Berthelot in 1986 to enhance outlier detection in periodic business surveys, particularly where units (e.g. companies, survey respondents) exhibit wide variations in size. Detecting outliers in survey data can be difficult due to the extreme variation in the size of respondents.
📥 Download Rosner 1983 PDF Detecting the Unusual: A look at GESD or Rosner’s Test for Outlier Detection In an era where data drives nearly every decision, the ability to spot what doesn’t fit has become more critical than ever. Whether it’s detecting fradulent transactions, monitoring network security, identifying equipment failures, or ensuring product quality, anomaly detection serves as a vital safguard. By uncovering patterns in data that do not conform to what is normal or expected, it enables us to respond quickly to risks, reduce losses, and even acticipate problems before they escalate.
📥 Download Data CSV 📥 Download Python yml Context This case is about a bank which has a growing customer base.
Blog with Jupyter Notebooks! Create a directory for the blog post somewhere like: content/post/test-post/ Within this directory create an ipynb named: index.ipynb The first cell of the notebook to have the following content: