Bhubaneswar, Odisha, India
+91-8328865778
support@softchief.com

ERROR when writing file to S3 bucket from EMRFS enabled Spark cluster

ERROR when writing file to S3 bucket from EMRFS enabled Spark cluster

ERROR :

18/03/02 01:42:17 INFO RetryInvocationHandler: Exception while invoking ConsistencyCheckerS3FileSystem.mkdirs over null. Retrying after sleeping for 10000ms. com.amazon.ws.emr.hadoop.fs.consistency.exception.ConsistencyException: Directory ‘bucket/folder/_temporary’ present in the metadata but not s3 at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:506)

 

Root cause :

Mostly the consistent problem comes due to

  • Manual deletion of files and directory from S3 console
  • retry logic in spark and hadoop systems.
  • When a process of creating a file on s3 failed, but it already updated in the dynamodb.
  • when the hadoop process restarts the process as the entry is already present in the dynamodb. It throws the consistent error.

Solution :

Try re-run your spark job by cleaning up the EMRFS metadata in dynamo db.

Follow the steps to clean-up & Restore the indended specific directory in the S3 bucket….

 

Deletes all the objects in the path, emrfs delete uses the hash function to delete the records, so it may delete unwanted entries also, so we are doing the import and sync in the consequent steps

Delete all the metadata

emrfs delete   s3://<bucket>/path

Retrieves the metadata for the objects that are physically present in s3 into dynamo db

emrfs import s3://<bucket>/path 

Sync the data between s3 and the metadata.

emrfs sync s3://<bucket>/path 

After all the operations, to see whether that particular object is present in both s3 and metadata

emrfs diff s3://<bucket>/path 
You can enroll now !We are giving 30% discount on our Internship Program

Don’t miss the chance to participate in the upcoming Internship Program which will be done using Microsoft Dot Net Web Development Full Stack Technology. The new batch will be starting from May 20, 2024.  We will have most experienced trainers for you to successfully complete the internship with live project experience.

Why to choose Our Internship Program?

Industry-Relevant Projects
Tailored Assignments: We offer projects that align with your academic background and career aspirations.
Real-World Challenges: Tackle industry-specific problems and contribute to meaningful projects that make a difference.

Professional Mentorship
Guidance from Experts: Benefit from one-on-one mentorship from seasoned professionals in your field.
Career Development Workshops: Participate in workshops that focus on resume building, interview skills, and career planning.

Networking Opportunities
Connect with Industry Leaders: Build relationships with professionals and expand your professional network.
Peer Interaction: Collaborate with fellow interns and exchange ideas, fostering a supportive and collaborative environment.

Skill Enhancement
Hands-On Experience: Gain practical skills and learn new technologies through project-based learning.
Soft Skills Development: Enhance communication, teamwork, and problem-solving skills essential for career success.

Free Demo Class Available