Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

pyspark

How to add a current timestamp column to pyspark DataFrame

Abhilash

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The current timestamp can be added as a new column to spark Dataframe using the current_timestamp() function of the sql module in pyspark.

The method returns the timestamp in the yyyy-mm-dd hh:mm:ss. nnn format.

Syntax

pyspark.sql.functions.current_timestamp()

Parameters

This method has no parameters.

Return value

This method returns the current timestamp.

Code example

Let’s see the code below:

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_timestamp
spark = SparkSession.builder.appName('edpresso').getOrCreate()
data = [("James","Smith","USA","CA"),
("Michael","Rose","USA","NY"),
("Robert","Williams","USA","CA"),
("Maria","Jones","USA","FL")
]
columns = ["firstname","lastname","country","state"]
df = spark.createDataFrame(data = data, schema = columns)
df_with_ts = df.withColumn("curr_timestamp", current_timestamp())
df_with_ts.show(truncate=False)
Adding the current timestamp to DataFrame

Code explanation

  • Line 4: A spark session with the app’s Educative Answers is created.
  • Lines 6–10: We define data for the DataFrame.
  • Line 12: We define the columns of the DataFrame.
  • Line 13: We create a DataFrame using the createDataframe() method.
  • Line 15: We add a new column to the data frame using the withColumn() method passing the new column name curr_timestamp and the value to assign to the column the timestamp value returned by the method current_timestamp().
  • Line 17: We print the DataFrame.

RELATED TAGS

pyspark

CONTRIBUTOR

Abhilash
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring