About 50 results
Open links in new tab
  1. Sampling - Guide - Apache DataFu Pig

    Sampling Pig has a built-in SAMPLE operator that performs Bernoulli sampling on a relation. Apache DataFu Pig provides additional sampling techniques for when Bernoulli sampling is not applicable. …

  2. Guide - Apache DataFu Pig

    Sampling: simple random sample with/without replacement, weighted sample, sample by keys Hashing: SHA and MD5 Link Analysis: PageRank Assorted Macros: deduplication of tables, human-readable …

  3. SimpleRandomSample (datafu-pig 1.6.1 API)

    It takes a bag of n items and a sampling probability p as the inputs, and outputs a simple random sample of size exactly ceil (p*n) in a bag, with probability at least 99.99%. For example, the following script …

  4. SimpleRandomSampleWithReplacementVote (datafu-pig 1.6.1 API)

    Scalable simple random sampling with replacement (ScaSRSWR). This UDF together with SimpleRandomSampleWithReplacementElect implement a scalable algorithm for simple random …

  5. Apache DataFu Pig - Getting Started

    Sampling Simple random sampling with or without replacement, weighted sampling. Link Analysis Run PageRank on a graph represented by a bag of nodes and edges. More Other useful methods like …

  6. datafu.pig.sampling (datafu-pig 1.6.0 API)

    Package datafu.pig.sampling Description Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.

  7. datafu.pig.sampling (DataFu 1.2.0)

    Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.

  8. SampleByKey (datafu-pig 1.3.2 API)

    The method of sampling is to convert the key to a hash, derive a double value from this, and then test this against a supplied probability. The double value derived from a key is uniformly distributed …

  9. SimpleRandomSample (DataFu 1.2.0)

    For example, DEFINE SRS datafu.pig.sampling.SimpleRandomSample ('0.01'); examples = LOAD ... grouped = GROUP examples BY label; sampled = FOREACH grouped GENERATE FLATTEN (SRS …

  10. datafu.pig.sampling (datafu-pig 1.3.1 API)

    Class Summary Class Description ReservoirSample Performs a simple random sample using an in-memory reservoir to produce a uniformly random sample of a given size. ReservoirSample.Final …