Các câu hỏi phỏng vấn cho vị trí Data Scientist

Hadoop interview questions

Hadoop interview questions

Thông thường, sau khi nghiên cứu một vấn đề mới, ta cần kiểm tra lại kiến thức xem mình đã hiểu được đến mức độ nào. Hay trước một buổi phỏng vấn, cụ thể là về Hadoop, ta thường được nhà tuyển dụng đặt ra các câu hỏi liên quan để trắc nghiệm năng lực chuyên môn của mình. Dưới đây tôi xin chia sẻ 100 câu hỏi về Hadoop để mọi người tham khảo.

  • Explain what regularization is and why it is useful.
  • Which data scientists do you admire most? which startups?
  • How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
  • Explain what precision and recall are. How do they relate to the ROC curve?
  • How can you prove that one improvement you’ve brought to an algorithm is really an improvement over not doing anything?
  • What is root cause analysis?
  • Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples.
  • What is statistical power?
  • Explain what resampling methods are and why they are useful. Also explain their limitations.
  • Is it better to have too many false positives, or too many false negatives? Explain.
  • What is selection bias, why is it important and how can you avoid it?
  • Give an example of how you would use experimental design to answer a question about user behavior.
  • What is the difference between “long” and “wide” format data?
  • What method do you use to determine whether the statistics published in an article (e.g. newspaper) are either wrong or presented to support the author’s point of view, rather than correct, comprehensive factual information on a specific subject?
  • Explain Edward Tufte’s concept of “chart junk.”
  • How would you screen for outliers and what should you do if you find one?
  • How would you use either the extreme value theory, monte carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?
  • What is a recommendation engine? How does it work?
  • Explain what a false positive and a false negative are. Why is it important to differentiate these from each other?
  • Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?
  • What is shuffling in map reduce?
  • What is the difference between an HDFS Block and Input Split?
  • What are the mapfiles in Hadoop?
  • Why do we need a password-less SSH in Fully Distributed environment?
  • What is the use of .pagination class?
  • Why is Replication pursued in HDFS in spite of its data redundancy?
  • What are the core components of Hadoop?
  • What differentiates Hadoop from other parallel computing solutions?
  • What is Difference between Secondary namenode, Checkpoint namenode & backupnod Secondary Namenode, a poorly named component of hadoop.
  • What happens when a datanode fails ?
  • What are the Side Data Distribution Techniques?
  • Explain what is Sequencefileinputformat?
  • What is shuffleing in mapreduce?
  • What is partitioning?
  • Explain what happens in textinformat ?
  • Can we change the file cached by Distributed Cache What if job tracker machine is down?
  • Explain what does the conf.setMapper Class do ?
  • Can we deploy job tracker other than name node?
  • What are the four modules that make up the Apache Hadoop framework?
  • How did you debug your Hadoop code?
  • Which modes can Hadoop be run in?
  • List a few features for each mode?
  • What is Hadoop Streaming?
  • Where are Hadoop’s configuration files located?
  • What is a combiner in Hadoop?
  • What is the functionality of JobTracker in Hadoop?
  • How many instances of a JobTracker run on Hadoop cluster?
  • List Hadoop’s three configuration files?
  • Is it necessary to write jobs for Hadoop in Java language?
  • What are “slaves” and “masters” in Hadoop?
  • What commands are used to see all jobs running in the Hadoop cluster and kill a job in LINUX?
  • How many datanodes can run on a single Hadoop cluster?
  • What is identity mapper?
  • What is job tracker in Hadoop?
  • How many job tracker processes can run on a single Hadoop cluster?
  • What is Identity reducer?
  • What sorts of actions does the job tracker process perform?
  • What is Commodity Hardware?
  • How does job tracker schedule a job for the task tracker?
  • What are the main components of Job flow in YARN architecture ?
  • What does the mapred.job.tracker command do?
  • What are the Main configuration parameters that user need to specify to run Mapreduce Job?
  • What is “PID”?
  • What is “jps”?
  • Is there another way to check whether Namenode is working?
  • How would you restart Namenode?
  • What is Chain Reducer ?

Nguồn: http://www.careerarm.com/wp-content/uploads/2015/10/hadoop.pdf

Tham khảo thêm: Top 50 Spark Interview Questions and Answers for 2016


Trả lời

Mời bạn điền thông tin vào ô dưới đây hoặc kích vào một biểu tượng để đăng nhập:

WordPress.com Logo

Bạn đang bình luận bằng tài khoản WordPress.com Đăng xuất / Thay đổi )

Twitter picture

Bạn đang bình luận bằng tài khoản Twitter Đăng xuất / Thay đổi )

Facebook photo

Bạn đang bình luận bằng tài khoản Facebook Đăng xuất / Thay đổi )

Google+ photo

Bạn đang bình luận bằng tài khoản Google+ Đăng xuất / Thay đổi )

Connecting to %s