基于Spark实现推荐算法-2:基于用户的协同过滤(理论篇)
Posted on
Edited on
基于Spark实现推荐算法-1:推荐算法简介
Posted on
Edited on
用Spark Streaming实时计算海量用户UV
Posted on
Edited on
Spark API 全集(3):Spark RDD API全集
Posted on
Edited on
RDD是啥
Resilient Distributed Dataset (RDD),弹性分布式数据集,是对不可修改,分区的数据集合的抽象。
RDD is characterized by five main properties:
- A list of partitions
- A function for computing each split
- A list of dependencies on other RDDs
- Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
- Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)
[读书笔记] ES权威指南
Posted on
Edited on
Spark API 全集(2):Spark SQL 函数全集
Posted on
Edited on
Spark API 全集(1):Spark SQL Dataset & DataFrame API
Posted on
Edited on
Linux配置SSH免密登陆(公私钥登陆)
Posted on
Edited on
自定义开发Spark ML机器学习类 - 1
Posted on
Edited on