본문 바로가기

재융

Notice

Recent Posts

Popular Posts

Recent Comments

Link

Calendar

Tags

더보기

Archives

Visits

Today

Yesterday

개발 공부방

중복제거

[Pyspark] groupBy 개수 중복제거 countDistinct

Data/Data Analysis 2020. 11. 12. [Pyspark] groupBy 개수 중복제거 countDistinct 데이터 분석을 할때 groupBy를 굉장히 많이 사용하는데. 이런 경험이 있었다. 예를들어서 df란 pyspark.DataFrame이 다음과 같이 있다면 date accountId timestamp another 2020-11-02 A 1 B 2020-11-02 A 2 C 2020-11-02 A 3 B 2020-11-02 A 4 D 2020-11-02 B 1 A 2020-11-02 B 2 C 여기서 날짜 그리고 accountId를 묶어서 another의 개수와 평균 timestamp를 계산하려고하면 이렇게 쓸수있을것이다 df.groupBy('date', 'accountId')\ .agg(F.count(F.col('another')).alias('count'), F.avg(F.col('timestamp')..

[Pandas] pandas Dataframe 중복확인 및 중복제거

Data/Data Analysis 2018. 9. 6. [Pandas] pandas Dataframe 중복확인 및 중복제거 중복이 있는지 확인하는 방법 notation:data : 중복검사 대상 dataframe 중복확인 방법data.duplicated() 중복제거 방법 data.drop_duplicates()

이전 1 다음

티스토리툴바