• Before posting a question, please check our Frequently Asked Questions page as well as previous threads here. Odds are you aren't the first to ask, and you may find the answer without having to post!

Help 502

I've gotten them a couple times too. Not sure yet what's causing it, but they don't typically last long. A couple refreshes fixed it for me.
 
502 is really a server error, not a network or client error. The most likely place this would happen is between the server's webserver (nginx) and the PHP backend to process page requests.
Is there any particular time you can give me when this happens, or is it completely random?
 
502 is really a server error, not a network or client error. The most likely place this would happen is between the server's webserver (nginx) and the PHP backend to process page requests.
Is there any particular time you can give me when this happens, or is it completely random?

¯\_(ツ)_/¯

Cold Ramen Noodles Cold Ramen Noodles Any particular time this happens?
 
What's odd is that I never encounter these. If it's an issue with the server, I would expect it to affect everyone based on how much they visit the site.
 
Let me know when you get this again. I want to see if I can correlate a log entry.
 
Code:
[29-Nov-2017 18:44:56] WARNING: [pool rpnation] server reached pm.max_children setting (24), consider raising it

Code:
2017/11/29 18:45:56 [error] 12749#0: *7165357 connect() to unix:/var/run/rpnation.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 98.111.244.150, server: www.rpnation.com, request: "POST /threads/main-rp-here.371407/save-draft HTTP/1.1", upstream: "fastcgi://unix:/var/run/rpnation.sock:", host: "www.rpnation.com", referrer: "https://www.rpnation.com/threads/main-rp-here.371407/page-4"

Code:
# Time: 171129 18:45:56
# User@Host: ---------------- []
# Thread_id: 39802719  Schema: -------------  QC_hit: No
# Query_time: 68.437642  Lock_time: 0.000040  Rows_sent: 4  Rows_examined: 161522
# Rows_affected: 0
SET timestamp=1511999156;
SELECT post.*
                                        ,
                                        bb_code_parse_cache.parse_tree AS message_parsed, bb_code_parse_cache.cache_version AS message_cache_version,
                                        user.*, IF(user.username IS NULL, post.username, user.username) AS username,
                                        user_profile.*,
                                        user_privacy.*,
                                        signature_parse_cache.parse_tree AS signature_parsed, bb_code_parse_cache.cache_version AS signature_cache_version,
                                        session_activity.view_date AS last_view_date,
                                        liked_content.like_date,
            xf_hoffi_dm_rolls.roll_id as roll_id,
        threadmarks.threadmark_id,
        threadmarks.threadmark_category_id as threadmark_category_id,
        threadmarks.label as threadmark_label,
        threadmarks.edit_count as threadmark_edit_count,
        threadmarks.user_id as threadmark_user_id,
        threadmarks.position as threadmark_position,
        threadmarks.last_edit_date as threadmark_last_edit_date,
        threadmarks.last_edit_user_id as threadmark_last_edit_user_id,
        threadmarks.threadmark_date as threadmark_threadmark_date,
        tm_user.username as threadmark_username
                                FROM xf_post AS post

                                        LEFT JOIN xf_bb_code_parse_cache AS bb_code_parse_cache ON
                                                (bb_code_parse_cache.content_type = 'post' AND bb_code_parse_cache.content_id = post.post_id)
                                        LEFT JOIN xf_user AS user ON
                                                (user.user_id = post.user_id)
                                        LEFT JOIN xf_user_profile AS user_profile ON
                                                (user_profile.user_id = post.user_id)
                                        LEFT JOIN xf_user_privacy AS user_privacy ON
                                                (user_privacy.user_id = post.user_id)
                                        LEFT JOIN xf_bb_code_parse_cache AS signature_parse_cache ON
                                                (signature_parse_cache.content_type = 'signature' AND signature_parse_cache.content_id = post.user_id)
                                        LEFT JOIN xf_session_activity AS session_activity ON
                                                (post.user_id > 0 AND session_activity.user_id = post.user_id AND session_activity.unique_key = CAST(post.user_id AS BINARY))
                                        LEFT JOIN xf_liked_content AS liked_content
                                                ON (liked_content.content_type = 'post'
                                                        AND liked_content.content_id = post.post_id
                                                        AND liked_content.like_user_id = 40384)
            LEFT JOIN xf_hoffi_dm_rolls ON (post.post_id = xf_hoffi_dm_rolls.post_id)
        LEFT JOIN threadmarks ON (threadmarks.post_id = post.post_id )
        LEFT JOIN xf_user AS tm_user ON tm_user.user_id = threadmarks.user_id
                                WHERE post.thread_id = '343016'
                                        AND post.post_date > '1498400954'
                                        AND (post.message_state IN ('visible') OR (post.message_state = 'moderated' AND post.user_id = 40384))
                                ORDER BY post.post_date DESC
                         LIMIT 4;

The first line is a warning from PHP telling me that there are 24 concurrent requests being processed. 24 is the maximum number of current PHP processes that the server is configured to handle. The reason for this limit is to prevent PHP from running away with all the server's memory and CPU time and causing it to become unresponsive or worse. Typically, this number is more than sufficient as most PHP requests are handled very quickly and the process can then handle the next request.

The second snippet is from the webserver, indicating that it tried to hand off a PHP request, but was unable to because it didn't get a proper response. This would be because there were no PHP processes available to handle the request when the webserver was ready to hand it off. I found about 6,600 of these errors in the webserver log, all around this same timeframe of about a minute.

The third snippet is a slow database query logged at about the same time as the other errors occurred. This is the cause. I was able to re-run this query manually and force the same freeze up as you likely saw before (you might have noticed the site go unresponsive a few minutes ago). This query is locking the database such that other incoming requests can't be served until the query is complete. Since the query takes over a minute to execute (68 seconds as you can see in the log), there will be plenty of stacked-up requests to overload the PHP processes and stall the server.

I ran a SQL Explain on this query and it told me that the database engine is doing a filesort on 50,000 entries in order to fulfill this query. That is most likely what is causing the query to take over a minute to run. There are a few ways to address this, none of them easy.

1. Eliminate the site feature causing the query to run.
2. Optimize the query by possibly creating a new index or modifying the query so the database engine chooses a better index.
3. Work around the slow query by sharding the database (for example, create a second database server that only serves read requests so the main DB can stay online during this kind of operation).

First of all, it would help to know where this query is coming from and how it gets triggered. It clearly isn't happening frequently or we would notice this problem more often.


Thanks for the report. I found these log entries that may be helpful in tracking down the problem. Unfortunately, it looks to be more complicated than simply adjusting some server settings.
 
There's nothing you can do on the client side. This is entirely a server issue.
And I found a more direct cause. The query above references one thread in particular. It's working with the posts that are contained in a single thread. The problem is that the one thread has over 20,000 posts. There are entire xenForo boards with fewer posts than that. The forum software likely wasn't designed to handle threads with this many posts in them. I think we need to look at capping the total number of posts allowed for threads.
This wouldn't mean the roleplay has to stop, just that we need to close the existing thread and start a new one where it left off. It would help keep these long running queries from happening.
 
I've had this happen a few times as well, also Rex and I both had a few hours were we couldn't even bring up the site kept saying it was disconnected
 
We're working on the 502 issue. Last night's looong downtime was a different issue which hopefully is now resolved.
 
A bit of an update on this. I've been getting them more often. For example, I could be typing up a response in a PM and I'll get a big message saying the 502 error. It goes away right away, but sometimes I can get them a few times in a row.
 
Got this about 7 times in a row right now. I was wondering if there was any update on this. At times, I almost can't use the site.
 
Last edited:
Update! Since the site's update, I have not had this problem aside from when Wiz was working on something and other people were having the issue as well. It was becoming a daily problem but I haven't had it since the update!

Huuuzah
 
We thought that might be the case ^33^ The update is much better able to handle the sort of things that was leading to the 502 errors.

Glad it's fixed for you!
 
Thanks for the update on this. We still have extremely large threads, but we don't seem to have the heavy queries that were causing these lockups before.
 

Users who are viewing this thread

Back
Top